Br Fixations of the Hiv-1 Env Gene Refute Neutralism: New Evidence for Pan-selective Evolution

We examined 103 nucleotide sequences of the HIV-1 env gene, sampled from 35 countries and tested: I) the random (neutral) distribution of the number of nucleotide changes; II) the proportion of bases at molecular equilibrium; III) the neutral expected homogeneity of the distribution of new fixated bases; IV) the hypothesis of the neighbor influence on the mutation rates in a site. The expected random number of fixations per site was estimated by Bose-Einstein statistics, and the expected frequencies of bases by matrices of mutation-fixation rates. The homogeneity of new fixations was analyzed using χ 2 and trinomial tests for homogeneity. Fixations of the central base in trinucleotides were used to test the neighbor influence on base substitutions. Neither the number of fixations nor the frequencies of bases fitted the expected neutral distribution. There was a highly significant heterogeneity in the distribution of new fixations, and several sites showed more transversions than transitions, showing that each nucleotide site has its own pattern of change. These three independent results make the neutral theory, the nearly neutral and the neighbor influence hypotheses untenable and indicate that evolution of env is rather highly selective.


INTRODUCTION
The Neutral Theory of evolution proposed that the evolutionary state (fixation, loss or polymorphism) of most alleles in loci is acquired and maintained by mutation and random genetic drift (Kimura, 1968;King and Jukes, 1969;Kimura, 1979Kimura, , 1991Kimura, , 1993)).Later on this same proposal about alleles was extended to nucleotide bases in nucleotide sites.Under the neutralist viewpoint, most alleles or bases are selectively equivalent (relative selection coefficients of alleles or bases are zero, or complementarily, the relative fitness is 1; Crow and Kimura, 1970) and are rarely acquired or maintained by positive selection; purifying (lethal and sub lethal) selection occurs infrequently.Thus neutral fixation is a fundamental concept for the Neutral Theory.The value of «rarely» has never been given and we will assume it means lower than 5%.Pure neutralism could not be supported based on results from studies of synonymous or non-synonymous codon replacement (Kreitman, 1996a;Ohta, 1996;Nei, 2005) and comparative genomics (Clark, 2006).Near-neutralism replaced neutralism, as well presenting unviable expectations.Near-neutralism accepts selective processes with selection coefficients of the order of the mutation rate or the reciprocal of the population size (Crow and Kimura, 1970;Kreitman, 1996aKreitman, , 1996b;;Ohta, 1996;Nei, 2005).Neutralism and near-neutralism entail an isotropic expected distribution of mutants in nucleotide sites along the DNA (Valenzuela, 2009).This expectation has been conclusively refuted in known genomes by isochores and signatures that have been maintained for billions of generations (Bernardi, 1993;Karlin and Mrazek, 1997;Valenzuela, 1997Valenzuela, , 2009;;Mrazek and Karlin, 2007).As well, the sequence of nucleotides is not what would be expected by random mutation (Gatlin, 1976;Valenzuela, 2009).Neutralists counter-argued that there was not sufficient time for base mutation rates (of the order of 10 -8 mutation/site/ generation = m/s/g) to reach equilibrium, and the anisotropic distribution of bases could be the result of the influence of a base's neighborhood on the mutation rate at a specific nucleotide site (Jukes, 1976;Kimura and Ohta, 1977).However, neutralism, nearneutralism and the neighbor influence hypothesis still imply an isotropic random distribution of bases in DNA segments.By refuting DNA (or RNA) isotropy, we conclusively refuted neutralism, nearneutralism and the neighbor hypothesis (Valenzuela, 2009).This auxiliary hypothesis (neighborhood) can also be tested by maintaining a specific fixed neighborhood [for example, mutation analysis of the central adenine c(A) of the triplet AAA, c(T) of TTT, c(G) of GGG and c(C) of CCC], as we shall test in this article.Precision is needed at this point.The different bases (amino acids) found in a nucleotide site (amino acid locus) in different lineages (strains or species) have been erroneously named substitutions or replacements.They are fixations or substitutions that reached a frequency of 100% in that specific lineage (strain, population or clone for parasites within one host), some time ago, and so remained until they were sampled.It is clear that fixation (permanence) is antithetical to substitution (continuous replacement; Valenzuela and Santos, 2006;Valenzuela, 1997Valenzuela, , 2000Valenzuela, , 2002Valenzuela, , 2007Valenzuela, , 2009)).The substitution rate is a turnover rate, which for neutral mutations is (non-dimensionally) equal to the value of the mutation rate (which is also a turnover rate).Dimensionality is important because it indicates different processes.The dimension of mutations is mutation/site/generation = m/s/g.Since the probability for a mutation to reach the substitution level (frequency 1.0) is dimensionally substitution/mutation (sub/m), and equating both was obtained by their product (King and Jukes 1969) we have m/s/g times sub/m = sub/s/g; which is the rate of neutral mutation equate numerically the rate of neutral substitution, but they are different processes (substitutions include copies, population diffusion, etc.).Fixations cannot be expressed dimensionally "per generation", because they remain an undefined number of generations in that taxon [only fixations/(site or locus)].Contrary to any observation, recurrent neutral substitutions in a site are expected to lead always to polymorphisms, not to fixations.Fixations can only be generated and maintained by selection, even in populations with only one bacterium (Valenzuela and Santos, 1996;Valenzuela, 1997Valenzuela, , 2000Valenzuela, , 2002Valenzuela, , 2007Valenzuela, , 2009)).Since this conceptual mistake is generalized (King and Jukes, 1969;Nei, 2005;Clark, 2006), we use fixation and substitution properly and never as synonymous.
Here we use env gene sequences from 103 HIV strains to test the random distribution of bases because we assume that neutral evolution implies a random distribution of mutations, substitutions or fixations.Under neutralism, viruses with high mutation rates are expected to rapidly reach the equilibrium of base frequencies.For neutral evolution, this equilibrium is attained when the four bases have frequencies near 0.25 (Jukes and Cantor, 1969;Valenzuela and Santos, 1996;Li, 1997).For nearly neutral evolution, the equilibrium frequency of the positive selected site is near 0.43 and those of the other (negatively selected) three bases are near 0.19.Although these neutral or nearly neutral expected frequencies for large populations have never been found, and most nucleotide sites remain monomorphic, the neutral and nearly-neutral theory are not considered to have been definitively refuted, probably because several misconceptions on neutral evolution do not allow for seeing these contradictions (Valenzuela 2000(Valenzuela , 2002(Valenzuela , 2007(Valenzuela , 2009)).Our aim is to test neutralism, near-neutralism and the neighborhood hypotheses by using independent tests for the random distribution of bases in the HIV-1 env gene, and to estimate how far the distribution of fixations is from randomness.

Virus Sequences and Subtypes (Strains)
We analyzed 103 sequences belonging to 35 subtypes of HIV-1; a single stranded RNA lentivirus (subfamily) of the Retroviridae family (Fauci and Lane, 2001) retrieved from Genbank (http://www.ncbi.nlm.nih.gov/Genbank/HIV(see accession numbers in APPENDIX 1).They were chosen, as far apart as possible, from 35 countries of the five continents, in order to avoid over-representation of sequences with the same recent origin, and to increase the origin variability, between 1984 and 2001.Consequently, according to neutralism and taking into account high mutation rates, a large number of sites should be highly polymorphic for A, T, G and C. The DNA segment that codes for this envelope protein (env GP120) was aligned by using ClustalX (1.8) (http://www-igbmc.u-strasbg.fr/BioInfo/).The consensus fragment had 3,239 nucleotide sites, including deletions, insertions and ambiguous bases of the 103 strains.We worked with sites whose bases were determined unambiguously among the 103 strains (2105 sites), but numeration from the 1 st to the 3,239 th site was conserved.The longest DNA segment was present in the sequence AF119820 with 2,627 sites, 901 A (34.3%), 636 T (24.2%), 621 G (23.6%) and 469 C (17.9%).This sequence was taken as a reference to estimate the expected number of bases and triplets and the correlation between neighbor sites (also used in Valenzuela 2009).We demonstrated previously that the correlation of the bases between pairs of sites discriminated between consecutive sites (0-site separation) and pairs separated by 1, 2, 3 or more sites (Valenzuela, 2009).

Rationale to Study the Distribution of Fixations
Our aim is to test whether evolution is neutral (or nearly neutral) or selective.If evolution is neutral (or nearly neutral), then the distribution of mutations, substitutions and fixations (or complementarily losses) on nucleotide sites can be expected to be random (or nearly random).If evolution is selective, the distribution of mutations, substitutions and fixations on loci or sites can be expected to deviate from randomness or to be random.Thus, finding a non-random distribution of fixations refutes neutral and nearly neutral evolution, but finding a random distribution of fixations affirms neutral or nearly neutral evolution, but does not refute selective evolution.Thus, we use the powerful modus tollens logical method to refute neutral evolution; that is the proposition p (neutral evolution) implies q (random distribution of fixations).If q results false, then p is necessarily false.This logical method is conclusive, but this is not the case of searching for adaptive conditions where the fallacy of modus ponens is always present; p (selective processes) implies q (molecular constraints or correlations with structure or functions); we find those constraints or correlations, but this does not mean that p is true.
The study of the distribution of fixations at any DNA or RNA site of a genome or genome segment is a pre-transcriptional test for the evolutionary mode that also tests most transcriptional or posttranscriptional deviations from randomness.This occurs because an amino acid constraint of a protein is mostly a biased composition of amino acids of this protein, that is, a non-random distribution of amino acids among all the possible distributions of amino acids this protein can have.A non-random distribution of amino acids implies a non-random distribution of codons, which implies a non-random distribution of messenger RNA nucleotides, and this in turn implies a non-random distribution of DNA nucleotides.Our search for determining the deviation from randomness of the DNA nucleotides includes most amino-acid constraints correlated with non-random distribution of nucleotides.However, our study also includes all the pre-transcriptional selective causes correlated with non-random nucleotide distribution.We are not testing or searching for the particular causes of nonrandomness, found in the distribution of fixations, coming from molecular mechanics (Hamacher, 2008), structure or functions of proteins, drug or immune resistance, differences in codon positions or in synonymous or non-synonymous substitutions, or any other transcriptional or post-transcriptional process that imply clearly selective or adaptive conditions (Chen et al., 2004).These features or molecular traits correspond to one or to a few amino acids or nucleotide sequences among a huge number of all the possible ones that are included in our basic analyses of all the possible fixations.Most of these invariable structures or functions have been considered as un-debatable "constraints" (Chen et al., 2004;Valenzuela, 2007Valenzuela, , 2009)).However, the acquisition and maintenance of these constraints should be determined prior to any analysis, but at present, this is not done at all [as for example: Was the genetic code (or any constraint) acquired by drift or selection?Was it (Were they) maintained by drift?).Thus, if we find a basic non-random distribution of fixations, we refute neutral and nearly neutral evolution, and our present purpose is satisfied; the transcriptional or post-transcriptional causes of the acquisition and maintenance of these non-random distributions are unnecessary.Other research should address these aims.If we find random distributions of fixations, then, since selective evolution is still possible (type II epistemic and statistical error), these studies will be necessary to complete the demonstration, but they could only increase the power of our demonstration by a little if we find a major deviation from random distribution.However, our previous longitudinal analyses of the total HIV-1 genome and the longest env GP 120 gene sequences (Valenzuela, 2009) demonstrated a degree of deviation from random distributions, so to start with this hypothesis it is consistent with that result.
As well, our analyses are addressed to the dynamic evolution of a nucleotide site (not a base or an allele).If we know the history of processes (mutations, substitutions or fixations) that occurred in a site, the conditions of coding or non-coding (for amino-acid), synonymous or non-synonymous substitutions, or other transcriptional or posttranscriptional properties are irrelevant, because neutral evolution (the null hypothesis) assumes that these processes are equally distributed for evolutionary purpose on the sites that lead to these properties.Unfortunately, this history cannot be known directly and the method of sequence alignment should be used as a sufficient and acceptable determination of sites where these processes occur.However, insertions, deletions and micro-arrangements (micro-inversions) can confound sites and lead alignment methods to blur the true site history.This is why choosing different methods of alignment (parsimony, maximum likelihood, Bayesian, etc.) are also irrelevant for our purpose.Moreover, our method tolerates confounding of sites because the history of sites, under neutral evolution, is expected, as an average, to be equal.Our method was prepared for the analysis of complete genomes or genome segments regardless of the distinction between coding and non-coding regions and for longitudinal analyses of sequences (Valenzuela, 2009).
We chose the env gene because it shows more fixations than other HIV-1 proteins According to the neutral theory, a gene with more polymorphic sites (more substitutions, mutations or fixations) is a better candidate for being neutral than a highly monomorphic gene (Kimura, 1979) due to functional constraints, in spite of the contradictory proposition that more varied fixations implies a more selective genome region.The number of sites having a specific ancestral base was estimated by assuming that the ancestral base was the most frequent base in the site among the 103 strains used here.Ancestral states were also reconstructed using the parsimony criterion as implemented in PAUP (Swofford, 2002).Results from both approaches were convergent, and this approach gives advantage to neutral and nearly neutral evolution.For example, we found 741 sites with A as the ancestral base (ancestral fixations) that presented 8,524 different new fixations among the 103 strains; so our expected random distribution of 8,524 fixations on 741 sites has an expected mean equal to 11.5 fixations per site.To calculate the expected number of sites with 0, 1, 2, fixations, we need the following analysis.
I) Bose-Einstein distribution analyses for nucleotide sites having 0, 1, 2, 3, n fixations different from the ancestral base: We assume that neutral base mutations that originate neutral base fixations, by drift, are evolutionarily undistinguishable events that occur in evolutionarily distinguishable nucleotide sites.The random distribution of undistinguishable balls (events) in distinguishable boxes follows Bose-Einstein statistics (Feller, 1968;Valenzuela 2009).Thus, the neutral distribution of fixations found in the 103 strains can be tested by Bose-Einstein statistics (Valenzuela, 2009).Because this study assumes neutral evolution, as the null hypothesis, and compares the inter-site frequency vector of fixations, it is not affected by phylogeny relationships, the method of alignment and assigning the ancestral base, mutation rates, the chosen segment of DNA, the codon position, and other evolutionary conditions.This because any of these conditions is equal for all the sites, regardless of the gene to which they belong or the coding position, as we showed for longitudinal analyses of base sequences (Valenzuela, 2009).It is important to realize that the present study is a pre-transcriptional analysis.In the case of the distribution of fixations among sites with A as the ancestral base, we have: 59.2 expected sites with 0 A; 54.5 expected sites with 1 A; 50.1 with 2 A; … and so on.
Independently, neutral evolution can be tested by the base composition at equilibrium, with the frequency of bases in the DNA or RNA segment, or at disequilibrium by examining the tendency of the frequencies of new fixations in relation to the frequencies at equilibrium, as follows: II) Analysis of the expected frequencies of bases at equilibrium and disequilibrium They were obtained by applying the matrix of base mutation (Nei, 1987, see APPENDIX 2).This is the matrix that gives the mutation rate of a base, into any base; for example, the first row shows the rates of mutation of A into A (it remains unchanged), T, G or C.However, we do not have mutation rates, or substitution rates, but fixation rates.Thus, we homologated the fixation matrix with the mutation matrix to profit of its mathematical properties.One of its properties is the expected base frequency at equilibrium, when a large number of base changes have occurred these frequencies are equal for neutral mutation and fixation rates, provided that an exact proportionality between mutation and fixation rates is conserved (this is the neutralist expectation).Since HIV-1 mutates quickly, we may assume that its bases at any site have reached equilibrium.However, to cover all the cases (equilibrium and disequilibrium), we tested the equilibrium base frequencies with the ancestral base frequencies, the observed base frequencies, the longest segment, and the base distribution among new fixations (the tendency at disequilibrium).Appendix 2 shows that fixation rates ranged from 0.01 to 0.09 [f/s/(set of data)].Note that fixations do not have generation as a dimensional variable like mutations and substitutions.Moreover, viral mutation rates that range from 0.0001 to 0.00001 (m/s/g) demonstrate that mutation rates are very different from fixation rates.Basic population genetics demonstrated that in 10/m (m=mutation rate) generations the equilibrium of gene frequencies is attained with an error less than 5x10 -5 (Valenzuela and Santos 1996).If we assume that HIV-1 has more than 2/m generations per year, in five years the virus sequences should reach the equilibrium of frequencies.
B y e x a m i n i n g t h e h o m o -h e t e r o g e n e i t y distribution of fixations, we can perform a third independent test for neutralism and neutral neighbor influence.III) Testing the homogeneity of new fixations from one ancestral base to the other three in a context of three consecutive bases (neighborhood influence) For this test, we assumed that the neutralist neighbor influence is true and studied new base fixations in a site within its consecutive context of one upstream and one downstream site (a base triplet).Second, we assumed that most (95% or more) or all mutations are neutral, so their recurrent "mutation rates" are equal to their (recurrent) "substitution rates", as neutralists demonstrated (King and Jukes, 1969;Hey, 1999), and, we add equal to neutral "fixation rates".Third, we assumed that the best estimate of the neutral fixation rate, in a set of sites, is given by the total new fixations from the ancestral base among all the virus strains in the set of nucleotide sites.These three assumptions (besides that of parsimony, for assigning the ancestral base) concede the maximal advantage to the neutralist model, although we have demonstrated they are theoretically and empirically unsupported (Valenzuela and Santos, 1996;Valenzuela, 1997Valenzuela, , 2000Valenzuela, , 2002Valenzuela, , 2007Valenzuela, , 2009)).Thus, the test consists of comparison between the observed fixations in each site (or subset of sites) to the expected neutralist values estimated from the whole set of fixations.This test is also independent of the alignment method, the chosen DNA segment (or protein), the codon position and phylogenetic relationships.This occurs because we are again comparing the vector of fixations in a site to the vector of the other sites; that is, this is an inter-site comparison, and all these sources of possible differences are equal for all the sites.With the observed new fixated base proportion in the set of sites, we calculated the exact trinomial probability of the distribution (or a more extreme one) in each site of this set.Our procedure is like the one-tailed Fisher's exact test (Maxwell, 1961).We used the log-likelihood ratio χ 2 k test (k=degrees of freedom) to study the heterogeneity of substitution rates among sets of sites with small (<5) expected numbers (Howell, 2002).
We know that it is difficult for a reader, habituated to the standard studies, to understand that our analyses are independent of the method of alignment, assignment of the ancestral base, phylogeny, codon position and any condition (especially transcriptional and post-transcriptional ones) that is equal (neutral) for every site.This is because we are testing neutral or nearly neutral evolution by comparing fixations in a site to the expected vector of fixations estimated from all the sites and, under the neutral or nearly neutral assumptions, all the sites are expected to behave equally for this analysis and for alignment methods.Thus, the expected distribution of neutral or nearly neutral fixations (that is equal to neutral or nearly neutral mutations) is the same for every site.We prepared an example of this procedure based on Appendix 3. It shows 9 consecutive sites with very different vectors of fixations.The five central sites have adenine as an ancestral base.Thus, the three central sites correspond to AAA triplets (equal neighborhood) of consecutive equal ancestral bases.These three central sites show very different fixation vectors.Only site1804 agrees with the expected transition>transversion fixations.The contiguous sites 1805 and 1806 showed more transversions than transitions, but 1805 had more Thymine and 1806 more Cytosine.The χ 2 4 test for the homogeneous distribution of fixations among these 3 consecutive sites was 87.65 (p<<10 -6 ).Since these represent 103 different virus strains from 5 continents and 35 countries, the data show evolutionary convergence towards the distribution of fixations in the entire world.The χ 2 8 log-likelihood test (some cells have expected values under 5) for the homogeneous distribution of fixations among the 5 A central consecutive sites was 101.23 (p<<10 -6 ).However, the flanking A sites 1803 and 1807 are closer to monomorphism than to polymorphism.If we accept that the mutation and fixation rates are those of the 3 central ancestral A sites, it is not possible to account for the low mutation and fixation rates of the flanking ancestral A sites.If we explain this difference by the neighbor influence of non-A in the sites 1803 and 1807, we cannot account for the heterogeneous distribution of fixations in the 3 central A with the AAA neighborhood.It is evident that the codon position, the precision of alignment, as well as other transcriptional or phylogenetic conditions of these 9 sites, are irrelevant for the refutation of the neutral and nearly neutral theory or the neighbor influence hypothesis.

I) Bose-Einstein analyses
The number of ancestral sites (AS K ), the total number of new fixations (TF K ) and the average number of new fixations per site (AF K ), according to its K base (K subscript) were: for A AS A = 741, TF A = 8,524, AF A = 11.5; for T AS T = 526, TF T = 4,561, AF T = 8.7; for G AS G = 461, TF G = 5,246, AF G = 11.4; for C AS C = 373, TF C = 4,608, AF C = 12.4.Table 1 shows the expected and observed number of sites with the ancestral fixation (0 new fixations), with 1 new fixation, 2 new fixations and so on, until 68 fixations excluding empty (0 observed numbers for the 4 bases) and non-significant rows.For the 4 bases, the general distribution departed greatly from the expected random neutral distribution given by the Bose-Einstein model.There were a large number of conserved sites for the four ancestral bases (N=0); the significance of this row is sufficient to make the whole table of 69 rows significant (some of them not presented).The number of significant values (P<0.05)were: A = 13; T = 16; G = 19; C = 11.For 69 independent comparisons per column, 4 columns of data, at 0.05 level of significance, there were 0.05x69 x 4 = 13.8 expected significant values occurring by random fluctuations; there were 59 significant ones (χ 2 1 = 148.0,P<10 -30 ), most of them with probabilities lower than 0.01.

II) The Fixation-Matrix analysis
For neutral evolution, the mutation matrix, as well as the fixation matrix (APPENDIX 2), yielded equal expected frequencies of bases at equilibrium (a mathematical property of these matrices).These equilibrium frequencies were tested against the observed distribution of the ancestral bases, the distribution obtained with the total set of 103 strains, the distribution found in the longest segment and the base distribution found among the new fixated bases that indicates the tendency towards some equilibrium (Table 2).The first comparison is tested by a χ 2 3 with observed and expected numbers given by the frequencies multiplied by the number of ancestral bases (741A + 526T + 461G + 373C = 2101 bases), and the others with their respective numbers.All the comparisons resulted in major deviations from the frequencies expected under neutral equilibrium, because in the first three comparisons, a greater excess of observed A and deficiencies of G and C; T resulted similar to the expected values.New fixated bases showed a highly significant deviation due to a moderate deficiency of A, vast deficiency of T and excesses of G and C.These latter deviations cannot compensate for the former one in relation to the frequencies at equilibrium; they indicate another direction of variation of base frequencies that is also distant from the expected equilibrium frequencies.These three analyses indicate a non-nomological trend; thus they indicate that the base frequency follows a contingent history.

III) Testing the homogeneity of fixation rates among the sites in a fixed triplet context
We a n a l y z e d w h e t h e r t h e d i s t r i b u t i o n o f f i x a t i o n s i n e a c h s i t e i s r a n d o m l y ( o r homogeneously) distributed in comparison to the expected distribution among all the sites (which we assumed is the best neutral estimate, see again Appendix 3).For each of the 4 ancestral bases, there are 16 neighbors given by the four upstream and downstream bases.We studied the 64 triplets, but only AAA, TTT, GGG and CCC are presented, because the others (not shown) had similar patterns of deviations from the neutral expectations.
None of all the 2,105 sites (among the 103 possible bases for each one) had proportions of bases near the expected equilibrium frequencies (Appendix 2) A (31%), T (25%), G (25%) and C (19%).This result is impossible under neutral or nearly neutral evolution, considering that the average number of new fixations per site (AF K ) was over 8.6.In agreement with this result, it was possible to unambiguously assign an ancestral base for 2,103 of the 2,105 sites.The expected numbers of AAA, TTT, GGG, and CCC triplets in 2,103 sites, according to the model strain, were 84.7, 29.7, 27.7 and 11.9, respectively.We had 75 (10 without new fixations), 34 (10 without new fixations), 31 (6 without new fixations) and 21(7 without new fixations), respectively (the significance of the distribution of the number of fixations was performed for the whole segment in analysis I).The only significant excess of CCC was assumed to be due to neighbor influence (to give additional advantage to the neutral theory).Table 3 shows the distribution of new fixations for the original central A, T, G, and C in the AAA, TTT, GGG and CCC context, respectively.Sites were clustered in sets from that with smallest to that with the largest number of new fixations.In total, transitions (tsi) were more frequent than transversions (tve), a well-known expectation.A (a purine) was replaced by new fixations more frequently by G (56%, purine) than by T and C (44%, pyrimidines).T was replaced more by C (56%) than by A and G (44%), G more by A (80%) than by T and C (20%) and C more by T (79%) than by A and G (21%).However, these total fixation rates were extremely heterogeneous among subsets of sites with different numbers of substitutions.The four χ 2 tests were significant and three gave p-values of less than 10 -5 .Heterogeneities for T and C are dramatic.Three subsets of T sites showed more tve than tsi; the 102-98 subset had 17: 6 (tve: tsi); the 97-93 subset presented 25: 20; the 75-47 subset had 90: 69; the 102-93 subset of C had 19:10.From these subsets we analyzed the most substituted sites of A, T, G and C. Table 4 shows the analysis for Adenine.
Among the 18 sites of the 92-80 ancestral A subset, there were 11 significant sites (trinomial P<0.022).All the 16 sites of the 77-43 subset deviated significantly from the expected neutral distribution.There were non-significant sites in the first two subsets (102-93, not shown in Table 4); this does not mean they are not deviated from neutral expectation, because they are deviated in relation to several other sites of the total set (χ 2 analysis).Lack of power, i.e. few substitutions, did not allow a higher significance.The simple inspection of Table  Testing the expected equilibrium frequencies calculated with the fixation-matrix method, against the frequencies of the ancestral bases, the total ancestral bases and fixations, the longest segment and new fixated mutations.N = number of ancestral bases; NS = number of nucleotide sites 4 shows that each site has its own deviation from the expected neutralist proportion (at the bottom, Tot %).A great deal of sites presented 1 or 2 new fixated bases; although they have enough fixations to have the 3 possible ones.Table 5 presents the case of Thymine, and Table 6 that of Guanine and Cytosine.T, G and C showed the similar pattern found for Adenine.The case of T in the 75-47 subset is remarkable; as is G in the 88-57 subset.It is clear that every site has its own pattern of new fixations.

DISCUSSION
It is evident that mutation rates (m/s/g) are not substitution rates (su/s/g) and substitution rates are not fixation rates (f/s); they are dimensionally different.The neutralist error was to assume that once an allele or base reached a frequency of 100% by drift; it remains fixed (by drift) forever.Thus, the number of different bases found when comparing two or more DNA homologous segments, regardless of the time since they are evolutionary separated, was assumed to be the number of neutral substitutions.They are actually the number of selective fixations, because they are maintained by selection; neutral fixation is impossible (Wright, 1931;Feller, 1951;Valenzuela and Santos, 1996;Valenzuela, 2000Valenzuela, , 2002Valenzuela, , 2007Valenzuela, , 2009)).The expected frequency of a base that reached the frequency of 100% in the next generation is (1.0-m), being m the mutation rate.No allele or base can remain fixed, regardless of the population size (even with one individual; Valenzuela, 2000Valenzuela, , 2002Valenzuela, , 2007Valenzuela, , 2008)).Moreover, neutralists demonstrated that neutral evolution is independent of the population size (N) (Kimura, 1968;King and Jukes, 1969); as drift depends on N, we should conclude that neutral evolution is independent of drift (by logical laws of transitivity).Neutral evolution (as Brownian motion) occurs inexorably, regardless of the number of individuals (particles).For every nucleotide site of all genomes the expected neutral base distribution is approximately: 1 / 4 A, 1 / 4 T, 1 / 4 G and 1 / 4 C. Actual mutation rates different from equality do not substantively change this expectation; a sixparameter system of mutation rates conserves the equality of frequencies of A and T, and G and C and a similar figures between both pairs of bases (Sueoka, 1995;Valenzuela, 1997).The demonstration that not only neutral evolution, but also selective evolution, is independent of N and drift is outside of the scope of this article.The expected evolutionary movement by absolute (random) drift is always zero (Valenzuela, 2007).Our first demonstration is that new fixations (NF), occurring over nearly 20 years, do not randomly distribute in relation to their number.
A great proportion of sites did not allow NF.Unfortunately, these sites are separated from one another and do not indicate the existence of possible epitopes for preparing vaccines.Sites that had 0, 1, 2, 3, … n, NF behave heterogeneously; a great number of them are invariant, some are almost invariant, others are moderately variable and some highly variable, but the spectrum is largely deviated from randomness and cannot be produced by neutral or nearly neutral evolution, nor by the neighbor influence of bases (Tables 3 to 6 and  Appendix 3).The second demonstration with the expected frequencies of the bases at equilibrium confirmed the first one.It may be argued that HIV-1 has not reached equilibrium frequencies, but this is untenable, because the ancestral distribution of bases (it represents HIV-1 at the origin of human infection), the total distribution of bases and the one of the longest segment are very similar (with the exception of the non-significant T frequency).Thus, if there is any direction of the process of fixations, this is not towards the equilibrium frequencies.Moreover, the distribution of the bases of the new fixations is not the equilibrium distribution either and it is not going towards its own equilibrium.In summary, the structure of the fixation vector at any site shows rather a specific or contingent behavior, which is compatible with pan-selective evolution.
T h e s e c o n c l u s i o n s a re a ff i r m e d b y t h e analyses of the frequency vector of new fixations from the ancestral base, site by site, revealed an extraordinary high heterogeneity.It can be summarized as: -each site has its own pattern of new fixations; even though we homogenized by an equal context of 3 consecutive bases to control for a possible neighbor influence-.Thus, this highly significant heterogeneity refutes neutralism, nearneutralism and the neighbor influence hypothesis, in these virus strains.How would this precise and fine pattern of selection for each site and strain be produced?A detailed model including what is known about the interaction between the host defense system and HIV-1 is beyond the objectives of this article.However, this issue has been partially addressed by Moore et al. (2002) and Martin and Carrington (2005).Those authors proposed that this pattern of molecular evolution in HIV occurs because of the highly polymorphic host responses and defenses, based mostly on the innate immune system and adaptive humoral or cell mediated immunity.The variety, force and specificity of these three responses depend mainly on the HLA, immunoglobulins, immunoglobulinlike receptor systems, immunity cell mediated processes and interleukins.HLA is one of the most polymorphic systems involved in the viral antigen recognition through mechanisms performed by   those on protein molecular mechanics (Hamacher, 2008) or drug resistance of HIV (Chen et al., 2004), give strong additional evidence to our conclusion that evolution is rather pan-selective (see also Valenzuela, 2009), but do not change our conclusion that holds without these other evidences.The researcher in molecular evolution could think our analyses miss references to codons with their first, second and third positions, amino acid protein composition and synonymous or non-synonymous substitutions (Ka and Ks), which are the most frequent analyses found in the literature (Nei, 2005).However, most of those studies of transcriptional and post-transcriptional evolutionary stages are somehow biased since they assume a "neutral molecular background" of bases, genetic code, synonymous substitutions and other "constraints" which are not tested in their selective or neutral origin and maintenance.We scanned all the sites, regardless of whether they are the first, second or third position of the code or originate a synonymous or non-synonymous substitution.This scan of the whole DNA segment revealed that most, if not all, the sites (compared with the universe of sites) had their own non-neutral (selective) pattern of pretranscriptional evolution.Testing transcriptional or post-transcriptional stages of molecular evolution are unnecessary, because they are affected by the same non-neutral selective pattern.Post-transcriptional analyses should add different evidence on selective processes to this pre-transcriptional study, only when a random distribution of fixations is found.The reader should realize that if neutral mutations occur at random in the first, second and third position, their destiny (fixation, loss or polymorphism) is the same, regardless of their position, the expected frequency vector of mutation is also the same (see APPENDIX 3, where any other reference to transcriptional or post-transcriptional functions is not relevant to show a major deviation from randomness or neutrality).If the actual distribution of fixations in the codon positions is heterogeneous (as has been found in all the studies), then neutral and nearly neutral evolutions are impossible.The present results are in complete agreement with those of a previous study where the longitudinal total sequence of the HIV-1 and the longest env segment were analyzed (Valenzuela, 2009).In longitudinal analyses, it is more evident that transcriptional and posttranscriptional functions are not relevant to show deviations from randomness, because they consider only one sequence.Both completely different studies show only one conclusion: evolution of HIV-1 is close to pan-selective evolution.It is very probable that the heterogeneous pattern of fixations is due to the specific adaptive history of that virus strain to the immune system of the host patients, rather than to historical random mutations and substitutions.

TABLE 1
Expected (Exp) and observed (Obs) number (N) of fixations in a site, according to Bose-Einstein statistics AB = ancestral base in the site; P = Probability.NS = non-significant.Only significant and non-empty rows are shown.

TABLE 3
Substitutions of original central bases in AAA, TTT, GGG and CCC triplets of the env gene in 103 HIV-1 sequences.

TABLE 4
Substitutions of the original central A in AAA triplets of the env gene among the most substituted 103 HIV-1 sequences

TABLE 5
Substitutions of the original central T in TTT triplets of the env gene among 103 HIV-1 sequences (Reiher et al., 1986;Serres, 2001)e history of adaptive events of viral populations to evade host defenses, instead of the random drift of viral populations.As well, it has been shown that HIV envelop proteins are similar to host proteins(Reiher et al., 1986;Serres, 2001).These studies, as well as

TABLE 6
Substitutions of the original central G in GGG and C in CCC triplets of the env gene among 103 HIV-1 sequences.
N G = number of the original base Guanine; N C = number of the original Cytosine.