versão impressa ISSN 0716-9760
Biol. Res. vol.44 no.3 Santiago 2011
Biol Res 44: 283-293, 2011
Heterogeneous periodicity of drosophila mtDNA: new refutations of neutral and nearly neutral evolution
Carlos Y Valenzuela*
Programa de Genética Humana, ICBM, Facultad de Medicina, Universidad de Chile. Independencia 1027, Casilla 70061, Santiago, Chile.
We found a consistent 3-site periodicity of the X29 values for the heterogeneity of the distribution of the second base in relation to the first base of dinucleotides separated by 0 (contiguous), 1, 2, 3 ... 17 (K) nucleotide sites in Drosophila mtDNA. Triplets of X29 values were found where the first was over 300 and the second and third ranged between 37 and 114 (previous studies). In this study, the periodicity was significant until separation of 2011K, and a structure of deviations from randomness among dinucleotides was found. The most deviant dinucleotides were G-G, G-C and C-G for the first, second and third element of the triplet, respectively. In these three cases there were more dinucleotides observed than expected. This inter-bases correlation and periodicity may be related to the tertiary structure of circular DNA, like that of prokaryotes and mitochondria, to protect and preserve it. The mtDNA with 19.517 bp was divided into four equal segments of 4.879 bp. The fourth sub-segment presented a very low proportion of G and C, the internucleotide interaction was weaker in this sub-segment and no periodicity was found. The maintenance of this mtDNA structure and organization for millions of generations, in spite of a high recurrent mutation rate, does not support the notion of neutralism or near neutralism. The high level of internucleotide interaction and periodicity indicate that every nucleotide is co-adapted with the residual genome.
Key terms: DNA organization, non-protein-coding evolution, ordered nucleotide sequences, inter-base associations, refutation of neutralism.
Studies of evolution have assumed that most or all evolutionary processes are directly or indirectly related to protein synthesis and regulation (Nei, 2005; Valenzuela, 2009, 2010a; Nei et al., 2010). The acquisition and maintenance of the genetic code (as a whole) have been accepted non-critically as an out-of-evolution process. However, if the genetic code was acquired and is maintained by selective processes, all the other processes founded on the code are also selective. This non-critical position occurs with other aspects related to the structure, size and shape of the hereditary material (chromosomes, DNA or RNA segments), non-protein-coding functions and structures, replication functions (velocity, structural restrictions, etc.), number of bases, tandem repeat segments, isochores, signatures and several other properties of the hereditary material. For example, no systematic evolutionary studies have been performed on the acquisition and maintenance of isochores and signatures present in the biotic world for more than a thousand million years (Valenzuela, 2007, 2009, 2010a). Recurrent forward and backward point and chromosome mutations occurring equally (neutrally) at any nucleotide site destroy inexorably any chromosome or gene organization. These non-protein-synthesis and regulation-coding functions cannot be studied by the statistics of mutations, substitutions or fixations in relation to protein-coding functions (occurring at the first, second and third position of the codon), or in relation to associated functions as synonymous or non-synonymous fixations, or to biases of codon usage. The evolution of the chromosome structure (its constitution in centromeres, arms and telomeres) cannot be studied according to the genetic code for most chromosome segments have non-protein-coding functions. It is necessary to realize that protein-coding functions are a part of all the coding functions of the hereditary material. We can mention among these functions the folding-coding-functions for putting DNA or RNA viruses into their capsids or envelopes; coiling or hypercoiling-coding functions of prokaryotes or mitochondrion DNA; information for the relationships of DNA with histones or other associated proteins, information for the constitution of telomeres and centromeres, etc. These non-protein-coding characters and functions behave with non-Mendelian inheritance. Here, Mendelian inheritance is synonymous with particulate inheritance (Mendel's laws are cases of particulate segregation), in opposition to diffuse inheritance (Darwin's belief that the paternal and maternal "genetic factors" fuse in descendants). Thus, mitochondrial, prokaryote and virus inheritance, as we study them at present as far as genes are concerned, are fully Mendelian. The examination of a point mutation may help us to understand. When guanine (G) mutes to thymine (T) in a protein-coding segment, several kinds of phenotypes (pleiotropy) are produced (a mutation is always pleiotropic). I) a gene change leading to a synonymous or a non-synonymous mutation with Mendelian behavior. II) A structural change in the DNA because G (two chemical rings of purines) is larger than T (one ring of pyrimidines). This change is inherited as a Lamarckian character. III) A change in the velocity of replication and transcription because G-C implies 3 hydrogen bonds and T only 2 (a non-Mendelianly inherited character). IV) A change in the interactions with the residual genome (the remaining genome that is not this particular mutated site); this is partially a non-Mendelianly inherited trait and is the subject of this article. Other characters may be produced.
We analyzed these non-protein-coding functions depending on the nucleotide sequences by studying the relations and correlations (not only statistical) among all the nucleotides of a RNA or DNA segment, excluding any reference to protein-coding functions. Our study seeks to know the sequence information for tertiary or quaternary DNA structures (in relation to non-DNA molecules). We seek to affirm or refute neutral, nearly neutral or selective evolution (Valenzuela and Santos, 1996, Valenzuela, 1997, 2000, 2002, 2007, 2009, 2010a; Valenzuela et al., 2010).
Unexpectedly, we found a very high correlation between both bases of a dinucleotide separated by 0, 1, 2 ...K (K= 35) nucleotide sites in the whole genome of the HIV-1 virus, and in one env gene of this virus (Valenzuela, 2009). This correlation has nothing to do with coding-functions (it occurs between any nucleotide separated from the others by a number of sites that is or is not a multiple of 3). It is probably related to the tertiary structure needed to fold the RNA virus into its capsid. In a second study we found similar correlations until K= 21, in Drosophila melanogaster mtDNA, Gene Torso, and in human beta globin (pHb) gene (Valenzuela, 2010a). However, the significance of the interaction was observed until K = 309 in HIV-1 and until K = 609 in mtDNA (Valenzuela 2010b). These correlations between nucleotides separated by more than 4 sites do not support neutral or nearly neutral evolution and indicate that evolution of HIV-1, mtDNA, gene Torso and pHb are rather panselective. This conclusion follows from the fact that a huge number of dinucleotides, significantly more frequent than randomly expected, have been positively selected over hundreds of millions of cell generations. Dinucleotides that are less frequent than randomly expected have been negatively selected over and over again over hundreds of millions of cell generations. The maintenance of this strong interaction over hundreds of millions of DNA replication cycles can be only achieved by a widespread selective process, because recurrent mutation destroys any non-random nucleotide association.
Moreover, a three nucleotide periodicity was found only in mtDNA among these strong non-random distributions which is not related to protein synthesis coding functions (Valenzuela, 2010a, 2010b, this article). This periodicity was seen in the series of X29 values [9 degrees of freedom due to four rows (less one) for the first and fourth columns (less one) for the second nucleotide of the dinucleotide] that measure the distance to randomness (neutrality). It should be emphasized that this periodicity is not related to protein-coding processes due to definite conditions: I) Among gene segments, both strands of the mtDNA code for tRNA, rRNA and mRNA in the opposite 3'-5' sense; however, in the present study the analysis of correlations is performed in one strand. A few genes overlap a small part of their sequences, but most of them are separated by a few nucleotide sites. Thus most if not all correlations of nucleotides separated by K sites calculated on the whole mtDNA do not coincide with any long series of codon positions. II) Within a gene segment the periodicity (for small K) is found in dinucleotides whose bases belong to the same or to another codon (see APPENDIX 1). III) When K is large and, since the largest gene segment (ND5) in this mtDNA has 1723 sites, the two bases belong mostly to different genes. IV) Since mtDNA codes for tRNA, rRNA and mRNA the correlations among bases of a dinucleotide separated by large K occur often between nucleotides belonging to these three kinds of DNA (coding positions are defined only for mRNA). V) Some correlations are found between bases belonging to coding and non-coding segments. VI) We have found that the significant periodicity extends to more than 600K (Valenzuela, 2010b) or 2000 K (this article), a distance beyond any protein-coding mtDNA segment.
The extension of this analysis to other mtDNA or genomes showed the same result (Valenzuela 2010a, 2010b, 2011). To test our program we examined the collagen type I alpha 2 gene (it is a periodical gene) and as was expected, a highly significant association of the bases of a dinucleotide was found every 3 and 9 nucleotide sites. Significant non-random associations and periodicities were found in long prokaryote genes, but not in non-periodical eukaryotes genes (Valenzuela 2011) or in short prokaryote genes (Valenzuela unpublished). Significant deviations from randomness and periodicities were present in these genomes until K = 1007 or more (Valenzuela 2011, this article). It is possible to think that these non-random interactions and periodicities are due to mathematical or statistical artifacts from trivial properties of polymers; or that the highly significant association between contiguous nucleotides generates the others. The present study intends to show that these are not the case and that there is a systematic genetic structure underlying the base associations in dinucleotides separated by 0, 1, 2 ... K sites, in the Drosophila melanogaster mtDNA (19,517 bp) and in four equal consecutive segments of 4,879 bp.
RATIONALE, DATA AND METHOD
The expected internucleotide correlation under mutation alone
In the present disciplinary matrix of evolution, mutation occurs independently of the following fate of the mutant allele or base and independently of the processes of natural selection or genetic drift (Prevosti, 2000; Valenzuela and Santos, 1996; Valenzuela, 2000, 2002a, 2011a). The mechanisms of mutation and repair occur with their own matter-energy characteristics. Thus, the occurrence of mutation at any site is mostly independent on the occurrence of mutation at any other site. This does not mean that mutation occurs at random, because it is known the variation of the mutation rate (cold, hot-, normal-spots; Valenzuela and Santos, 1996; Li, 1997; Valenzuela, 2000), and mutation seems to be influenced by the neighborhood, at least in laboratory conditions with mutagens acting on viral RNA (Koch, 1971). The mutation rate varies enormously from organism to organism, but it is similar in similar organisms with some exceptions (Drake, 1993,1999, 2009; Drake et al., 1998; Mackwan et al., 2008). It is assumed that equal neighbors have equal mutation rates; that is mutation rates occur with isotropy in DNA or RNA nucleotide sites (Valenzuela, 1997). If we consider only mutations, under neutral evolution the 4 bases are expected to be in a site with equal probability along with evolution during a number of generations larger than the inverse of the mutation rate (Valenzuela and Santos, 1996; Valenzuela, 1997, 2000, 2002).
Then, the expected historical correlation of two neutral bases located in two different sites is zero. If the neighbor influence operates depending on one upstream and one downstream base, there are, for every base, 16 different contexts with 16 different influences on mutation rates that yield an average mutation rate for all these contexts (with 2 sites of influence 256 contexts are produced). However, these contexts should be, in turn, influenced by their contexts of each upstream base and each downstream base, and these second sets of contexts should be influenced by the bases that are 3 up- and 3 downstream sites, and so on. Since all the bases are continuously mutating (fixation is impossible), it is expected that the average difference in neighbor influence on every mutation rate should be small or zero. Independently of these factors, mutation and its possible neighbor influence cannot generate the meaning (a wide internucleotide correlation among all the sites needed to code a protein or a biotic hermeneutics) of the DNA segments by neighbor influence of bases on the mutation rate, because it occurs independently of the environmental requirements for the living being. Mutation is, from a biotic viewpoint, hermeneutically powerless (Valenzuela, 2009, last paragraph). Thus the expectation for a correlation between the bases of a dinucleotide that are separated by 0, 1, 2 .K sites is zero or near zero. Gatlin (1976) found non-random distribution of longitudinal nucleotide sequences and proposed this sequential order was a proof for non-neutral evolution. Neutralists answered fast (Jukes, 1976; Kimura and Ohta, 1977) proposing that the neighbor influence of bases on mutation rates could explain this order. However, this is an intuitive, ad hoc hypothesis that was never demonstrated and, as we saw, mutation plus the neighbor influence cannot produce the meaningful order we see in life (Valenzuela 2009, 2010a; Valenzuela et al., 2010). It is very often proposed that the genetic code implies a constraint that explains the order. This argument falls into rational circularity, because the cause that originated and maintains the genetic code is sent to the unexplained, non-analyzable or un-debatable set of evolutionary processes (constraints). Recurrent mutation inexorably destroys any nucleotide sequence (also sequential constraints), as is evidenced regularly by the cancers, aging and genetic diseases of living beings (Valenzuela, 2007, 2009; Valenzuela et al., 2010).
The expected internucleotide correlation under mutation and random drift
Random fluctuations of genetic frequencies (genetic drift) could result in frequencies of alleles in a locus or base in a site reaching frequency 1.0 (substitution), 0.0 (elimination or loss) or between 0.0 and 1.0 (polymorphism). However, and by constitution and definition, genetic drift occurs independently and equally in all the nucleotide sites. Thus, it cannot generate a stable internucleotide correlation and is hermeneutically powerless. It may move up or down with the same average magnitude, but its final contribution is zero. A widespread error equalizes substitution (a turn-over process) with fixation (a permanent state). Thus, in early articles neutralists calculated the probability of what they named fixation, but it was substitution instead because it was the probability to attain the frequency 1.0 by random frequency fluctuations (Kimura 1962; Nei et al., 2010). To calculate the probability of fixation we need the number of generations over which the allele or base has remained fixed. The probability to remain at frequency 1.0 is completely different for alleles or bases that remain at this frequency for one million generations. With recurrent forward and backward mutation fixation, it is physically, logically, mathematically and biologically impossible (Wright, 1931; Feller, 1951; Valenzuela and Santos, 1996; Valenzuela, 2000, 2002, 2007, 2009, 2010a, 2011). Neither mutation nor drift can give sense to a DNA or RNA segment.
Thus, the expected average internucleotide correlation under recurrent forward and backward mutation and genetic drift is very small or stochastically zero. Gene mutations and genetic drift are hermeneutically empty biotic processes; they are similar to Brownian motion. This does not mean that they cannot give rise to biotic processes, but if they do, they would present random distributions of their elementary components (nucleotide or amino-acid sequences), as we shall see.
Only selection gives meaning to nucleotide sequences. Our logic of demonstration
The only process that can produce permanent biotic functional sense to sequences is selection, because it is a process of co-variation between biotic sequences and environmental requirements (adaptation). Our search for internucleotide correlations is founded in this feature of the evolutionary process. If we do not find significant non-random association between the two bases of dinucleotides, neutral or nearly neutral evolution is affirmed, but selective evolution is not refuted. However, if we find significant non-random associations between the two bases, neutral and nearly neutral evolution are refuted and selective evolution affirmed. Neutralists and nearly-neutralists included selection in their models but stated that their models were not related to adaptation (Ohta, 1992, 2002; Nei, 2005). They do not accept the pan-adaptationist condition of the Synthetic Theory of Evolution (Gould, 2002). Our position emphasizes that thermodynamically non-random nucleotide sequences cannot be maintained unless selection operates to do it. Thus, non-random sequences are really adaptive sequences that remain in spite of the strong tendency to entropic distributions. Dynamic non-random processes are physical conditions for life production and maintenance; thus, they are synonymous with adaptation (see also Introduction).
The Drosophila melanogaster mtDNA (GenBank accession NC 001709, with 19,517 nucleotide sites) was studied.
The heterogeneity of the distribution of the second base in relation to the first base in dinucleotides, whose bases are separated by 0 (contiguous), 1, 2, . K, nucleotide sites, was determined by a X2 test. The total deviation from randomness of the 16 possible dinucleotides (pairs) was determined by a X29 test [9 = degrees of freedom, 3 independent rows (4 possible bases less 1) times 3 independent columns (4 possible bases less 1)], and the particular deviation of a pair by a X21 test (its contribution to the total X29 value). These X2 tests and their associated probabilities directly measure the distance to neutrality (randomness). The first purpose is to know whether positive (more dinucleotides than expected, f) or negative (fewer dinucleotides than expected, J,) associations (in relation to random dinucleotide distribution) are present between the two bases of dinucleotides (X2 tests) and the extension in terms of the number of nucleotide sites (separation of K sites) and of the neighborhood influence. A second aim is to examine whether these base associations are homogeneously distributed along with the whole mtDNA or if they are different in four equal and consecutive sub-segments. For this aim, the total mtDNA with 19,517 nucleotide sites was divided into four consecutive and equal segments with 4,879 sites each. A previous study showed highly heterogeneous composition of bases and dinucleotides when the mtDNA was divided into ten segments (unpublished). Abbreviations: A = Adenine, T = Thymine, G = Guanine, C = Cytosine.
More specific details of the methods are published (Valenzuela 2009, 2010a). The most important methodological features are presented with help of APPENDICES 1 and 2. APPENDIX 1 describes the method to calculate the association for one coding segment. This appendix also shows that the correlations do not depend on protein-coding functions. APPENDIX 2 shows the analysis for dinucleotides whose bases are separated by 17 nucleotides (an example). Bases found in the 1st and 19th nucleotide sites constitute the first dinucleotide, the second includes bases at the 2nd and 20th sites, the third with bases at the 3rd and 21st sites, and so on, until the last dinucleotide whose bases are at the sites 19th, 499th and 19,517th. There are then 19,499 dinucleotides whose bases are separated by 17 sites. The expected number of dinucleotides is obtained by the frequency of the first base, times the frequency of the second base, times the total number of dinucleotides. Taking the observed frequencies of bases is the best estimate of the expected historical action of mutation rates and all the possible neighbor influences (as average) because these frequencies should be considered as the expected equilibrium frequencies under neutral evolution (see Valenzuela et al., 2010). APPENDIX 2
presents dinucleotides ordered according to the significance of their deviation from randomness. A sign indicates whether there is more (T) or fewer (J) dinucleotides than expected. Significance at the 0.05 level is found for a X21 value equal to 3.84 and for a X29 with a value equal to 17. There were 14 significantly deviated dinucleotides among the 16 possible pairs, 6 with excess and 8 with deficiency. The total excess was 825.6 and the deficiency added up to 825.5. The excesses are fewer, but larger than the deficiencies. Excesses are produced by positive selective processes, and deficiencies by negative selective processes that have been maintained over million of mitochondria generations to the present (see Rationale). The selective process occurs because mutations happen continuously and destroy any non-random associations. The facts that: i) 14 pairs among 16 are significantly distant to the random (neutral) expected distribution and i) the incommensurable value of the X29 test = 322.2 (P<10-50) refute neutral evolution definitively. We observe that most (14/16 = 87.5%) pairs of bases chosen at random and separated by 17 nucleotide sites are distributed enormously far from the expected neutral distribution. Furthermore, and considering that I) this distance to neutrality has been maintained by millions of mitochondrion generations, and II) major significances were found with separations from 0 to more than 2000, we can only conclude that these evolutionary conditions are impossible under neutral and nearly neutral evolution.
Table 1 shows the total X29 value found in the entire Drosophila mtDNA, when the bases of dinucleotides are separated by 0 (contiguous or consecutive), 1, 2, 3 . 17, nucleotide sites, and the X21 contribution to the X29 value by the five most significant pairs, ordered according to the value of their significance. The total significance for the X29 at the 0.05, 0.025, 0.01, 0.005 and 0.001 critical probability levels is found with 16.9 (17), 19.0 (19), 21.7 (22), 23.6 (24) and 27.9 (28) X29 values (rounded to integer number in parentheses), respectively. The significance of the deviation from the expected randomness of a particular pair may be evaluated by its contribution to the total X29 by the X21 values that are 3.84 (4), 5.02 (5), 6.64 (7), 7.88 (8) and 10.83 (11) for the same critical probability levels, respectively. The X2 values of Table 1 have been rounded to integer figures. The most significant figure for contiguous (0 separation) bases was the excess of G-G, followed by excess of C-C, depression of G-T, excess of G-C and excess of T-T pairs (X21=27.8); other deviations were less significant, even though six of them had X21 over 4 or a probability of less than 0.05 [(A-C)J 25; (T-C)J 11; (T-G)J 11; (A-A)T 11; (C-A)J 10; (T-A)J 5]. The spectrum of significances changes when K increases, as can be seen in the table (and in Table 2). The most significant pairs (1st pairs) showed more observed pairs than expected (T ); the other pairs showed both possibilities (T , J ).
We see a clear periodicity in the X29 value after the separation by 1 site. There are triplets of X29 values, the first over 300 that we named the head figure, followed by two consecutive smaller figures between 37 and 114 (tail 1 and tail 2 figures). The largest X21 contribution for the head figure was always given by an excess of G-G pairs, while for tail 1 and tail 2 figures it was given by excesses of G-C and C-G, respectively. There are other ordered distributions in the 2°, 3°, 4° and 5° pairs, but they are not as exclusive as those found in the most significant pair. Their analysis is left to the reader.
The periodicity in triplets of the X29 values indicates the deviation from the expected random distribution of the second base in relation to the first base of dinucleotides separated by 0, 1, 2 .17 was observed until 609 K (Valenzuela 2010b). It is also evident that the most significant pair shows a periodicity in these triplets: G-G, G-C and C-G, which could be followed until separation 36, with only one exception (separation 33 where G-C was replaced by G-G). However, it must be noted that the X2 test is rough for finding fine nucleotide associations because its high degree of variance may lead to variable hierarchical orders. Table 2 presents the analysis for two sets of 10 K: 1000-1009K and 2002-2011K. In both sets the periodicity is evident, with head values between 48 and 65 and tail values between 13 and 39 in the range 1000-1009K and head values between 38 and 49 and tail values between 7 and 21 in the range 2002-2011K. It is remarkable that in the range 1000-1009K, 9 among 10 X29 values are significant with a highest probability equal to 0.0062 (x29 = 23), and there are 6 significant X29 values with the highest probability 0.0179 (remember the inverse relationship between significance and probability) among 10 in the range 2002-2011K. As well, it is remarkable that the most significant X21 value was always given by more observed pairs than expected (T ) and was mostly G-G as head, G-C and C-G as 1st and 2nd tail, respectively in the 1000-1009K range. In the 2002-2011K range these last relationships holds, but it is necessary to consider the five X21 values to reconstruct them, given that other pairs appeared as the most deviated from randomness.
These pair structures are averages found when scanning large DNA segments. The existence of micro-isochores (sub-segments with different base composition; Valenzuela 1997, 2009, this article) and micro-signatures (sub-segments with different dinucleotides compositions) (Valenzuela 2009, this article) could change the structure of base associations of dinucleotides. Figure 1 shows that the Drosophila melanogaster mtDNA has different composition of bases in its segments. Thus, an analysis of the mtDNA divided into four equal segments was performed. Tables 3, 4, 5 and 6 show this analysis for the 1° (sites 1°-4,879°), 2° (4,880°-9,758°), 3° (9,759°-14,637°) and 4° (14,637°-19,516°) segments, respectively.
The base composition of the segments were: 1° (A = 1,663,
34.08%; T = 2,029, 41.59%; G = 567, 11.62%; C = 620, 12.71%); 2° (A = 2,126, 43.57%; T = 1,712, 35.09%; G = 428, 8.77%; C = 613, 12.56%); 3° (A = 1,972, 40.42%; T = 1,899, 38.92%; G = 397, 8.14%; C = 611, 12.52%); 4° (A = 2,391, 49.01%; T = 2,242, 45.95%; G = 87, 1.78%; C = 159, 3.26%). The high degree of heterogeneity of base composition of these four segments is evident (micro-isochores); particularly the G and C proportions decay around 5 and 3.8 times, respectively in the fourth segment, with the corresponding increase of A and T proportions. The X29 value for the heterogeneity of base composition of these 4 segments resulted 855.4 (P<10-100). The pair structure found in the total mtDNA is partially valid for these segments (especially for the excess of G-G as the most significant pairs), except segment 4° where non G-G pairs were the most significant. A detailed analysis and comparison of the four segments are left to the reader.
Highly significant (statistical) interactions were found between any nucleotide and nucleotides separated as far as 2011 nucleotide sites. This demonstrates that in the mtDNA organization any nucleotide maintains strong non-random associations with the whole mtDNA. Moreover, these interactions include a high significant periodicity between the bases of dinucleotides, when the bases are separated by at least by 0, 1, 2 . 2011 sites. The internucleotide correlations we have just described can be seen as processes of internucleotide co-adaptation. Those dinucleotides whose frequencies are over the random expected frequency were positively selected and are now maintained by positive selection; those that are below their expected frequencies were and are negatively selected. These strong associations, maintained over several million mitochondrion generations (the time during which Drosophila melanogaster has had this mtDNA) refute the neutral theory, the nearly neutral theory and the neighbor influence of a base on mutation rates of its neighborhood as main factors of evolution. It is impossible to maintain this organization during that time by random mutation, genetic drift and weak natural selection (nearly-neutral evolution). On the contrary, forward and backward recurrent mutation and drift are processes that should inexorably destroy this organization (Valenzuela and Santos 1996, Valenzuela 1997,2000, 2002, 2007, 2009; Valenzuela et al., 2010). Recurrent forward and backward mutations make neutral fixation impossible (Wright 1931; Feller 1951; Valenzuela 2000, 2002a; Valenzuela et al., 2010). Mutations do occur in mtDNA of eukaryote organisms during their life and are a main factor in aging and death (Gredilla et al., 2010). As well, hundreds of human mtDNA mutations are known that produce lethal or sub-lethal conditions (Tuppen et al., 2010). Thus, mtDNA is almost always destroyed (depending only on the life span of the individual) during the life of eukaryote individuals, but it is much more stable in phylogeny. The individual instability and phylogenetic stability are only possible if there are strong selective mechanisms (DNA repair and protection; Gredilla et al. 2010) acting on mitochondria from one generation to the next, especially in females among sexually reproductive species. The HIV-1 virus has a high correlation among its genome bases, but not periodicity (Valenzuela 2009, 2011). The HIV-1 virus may need this correlation to fold the RNA chain into the capsid, which is not needed by the mtDNA. However, mtDNA needs coiling and hypercoiling to protect itself and locate functionally in the mitochondrion matrix. It is fascinating that this structure may be present in prokaryotes, as we found preliminarily (Valenzuela 2010a, 2011).
The strong non-random association between a nucleotide (and its complementary one) in a DNA site and the residual genome, maintained over millions of generations, convinced us that the main selector (selection factor) for this nucleotide site is not the environment, but the residual genome. Once hereditary polymers are acquired in biotic systems, evolution goes on mainly by polymer interactions and recombination (Valenzuela, 2002a, 2002b), either in the endogenous or exogenous plane. Horizontal evolution, sex and symbiogenesis are positive selective mechanisms of inter-polymer selection in evolution. However, if other polymers are the best "friends" for a particular polymer, they may also be its worst enemies (negative selection) leading to polymer destruction, competition, diseases and extinction. We see only those organisms that reached a resilient equilibrium after the intra-individual polymers' coexistence or fusion.
The different structures of deviations from randomness according to significances found in the four segments with equal numbers of nucleotides indicate that the nucleotide association in dinucleotides is rather flexible and depends on the base composition of the segment (a relationship among purines and pyrimidines and base complementariness). The strong G-G, G-C and C-G associations found in the head, tail 1 and tail 2 periodicities, respectively, in the whole mtDNA represent the most frequent deviation from randomness, but, their frequencies vary in the four segments. Perhaps these statistical attractions and repulsions indicate physical attractions and repulsions that are necessary to accomplish important non-protein-synthesis functions that were and are crucial in evolutionary development. The high degree of heterogeneity of the base composition of the four segments maintained for millions of mitochondrion generations conclusively refutes neutral and nearly neutral evolution and the neighbor influence hypothesis (and independently of the internucleotide interactions described above). These theories and hypothesis predict a homogeneous distribution instead.
This article could finish here; however there are fundamental misconceptions in the neutralist and nearly-neutralist position that need special treatment (Valenzuela 2000, 2002a, 2010a, 2010b, Valenzuela et al., 2010). Perhaps the reader, habituated to phylogenetic analyses based on protein-coding features such as coding positions, synonymous or non-synonymous mutations, thinks our conclusive refutation of neutral and nearly neutral evolution is rather unsupported. We have received the critical position from the Neutral Theory of Evolution (made from neutralist and non-neutralist colleagues) that neutralism accepts "purifying selection" in the form of lethal and sub-lethal mutations as an important part of the theory. However, this proposition is not true as far as mtDNA is concerned, because the clinical genetic practice (a method with very low sensitivity to study selectors) shows that mtDNA mutations (in humans and animals) are lethal, sub-lethal, and compatible with life with and without impairment of reproduction (Tuppen et al., 2010). As mentioned above, neutralists protected the theory by including "constraints" such as the genetic code, the restriction to four nucleotide bases, some invariant functional parts of proteins or DNA, biases of codon usage, etc. However, neutralists or nearly neutralists (and selectionists or neo-Darwinists) never advanced a proportion of evolution that is due to lethal or sub-lethal nucleotide mutations or to invariant constraints, so as to put these hypotheses to the test (epistemologically this proposition is a negative heuristic protective belt). If the proportion of evolution in every nucleotide site is mostly (over 50%) due to lethal and sub-lethal bases, then evolution is, by definition, selective and not neutral or nearly-neutral. But in this case life is impossible. The same occurs with constraints. First, studies of protein constraints were focused on the active site of enzymes. Then allosteric sites, receptor sites, signal sequences, attachment to membrane sequences, and several other amino-acid functional sequences that conferred an invariant function to almost every amino-acid of a protein were described. Now, it is difficult to conceive of any amino acid of a protein without a function that severely constraints it. Constraints proposed without precision are also negative heuristic protective belts of neutralism or near neutralism. A simple question shows that: what is a constraint, in the evolutionary process? Or, how were constraints acquired and maintained in evolution during paleontological eras? To propose that the genetic code, eukaryote, prokaryote, unicellular, multicellular, vertebrate (and so on) organizations were acquired and are maintained by mutation and random drift is simply madness. Evolution is mostly conservation not variation (Valenzuela, 2007, 2009; Valenzuela et al., 2010). That the genetic code or any organization was acquired and is maintained mostly by selection implies that evolution is selective. Constraints are mostly produced by non-random nucleotide sequences; thus, they were included in the present analysis. The nearly neutral theory of evolution added selection to mutation and drift with a positive selection coefficient (Ohta, 1992, 2002) making this theory almost undistinguishable from the Synthetic Theory of Evolution.
The present analysis is a trans- supra- or non-protein-coding study. Moreover, these base-to-base interactions and periodicities occur between bases separated by 0 to more than 2,000 sites, and mitochondrial genes have less than 1,750 nucleotides. On the contrary, it is expected that a gene sequence disturbs or destroys non-protein-coding periodicities to specify its own coding message, which is seldom periodical. Furthermore, there are no G-G, G-C or C-G periodical dinucleotide associations in mtDNA, where the two bases are separated by 0 to more than 2,000 sites involved in protein-coding processes. Now, if we add to these widespread non-protein-coding co-adaptive interactions those due to protein-coding adaptive processes, we will have a more complete picture of pan-adaptive evolutionary processes. The reader interested in protein-coding functions will find the total nucleotide information for this mtDNA through the accession number in Genbank. The fourth segment includes most of the control region with several TA tandem repeat regions (see it in Fig 1). However, the information about the gene organization is, at present, not necessary for our study.
Note. These ideas were presented in the Annual Meeting of the Chilean Society of Evolution and the Chilean Society of Genetics, in Concepción, Chile, October 21 - 23 2009.
I am indebted to my student, now my colleague, Javier Cisternas for programming figures with the base distribution and for improving my programs.
DRAKE JW (1993) Rates of spontaneous mutation among RNA viruses. Proc Natl Acad Sci USA 90:4171-4175. [ Links ]
DRAKE JW (1999) The distribution of rates of spontaneous mutation over viruses, prokaryotes, and eukaryotes. Ann N Y Acad Sci 870: 100-107. [ Links ]
DRAKE JW (2009) Avoiding dangerous missense: Thermophiles display especially low mutation rates. PloS Genetics 5, Issue 6, e1000520Links ] Arial, Helvetica, sans-serif">.
DRAKE JW, CHARLESWORTH B, CHARLESWORTH D, CROW JF (1998) Rates of spontaneous mutation. Genetics 148: 1667-1686. [ Links ]
FELLER W (1951) Diffusion processes in genetics. Proc Second Berkeley Symp Math Stat Prob. Pp 227-246. [ Links ]
GATLIN LL (1976) Counter-examples to a neutralist hipótesis. J Mol Evol 7:185-195. [ Links ]
GOULD SJ (2002) The structure of evolutionary theory. The Belknap Press of Harvard University Press, Cambridge, MA, USA. pp: 518-524. [ Links ]
GREDILLA R, BOHR VA, STEVNSNER T Mitochondrial DNA repair and association with aging - An update. Exp Gerontol (2010) doi:10.1016/j. exger.2010.01.017 [ Links ]
JUKES TH (1976) Comments on Counter-Examples to a Neutralist Hypothesis. J Mol Evol 8:295-297. [ Links ]
KIMURA M (1962) On the probability of fixation of mutant genes in a population. Genetics 47: 713-719. [ Links ]
KIMURA M, OHTA T (1977) Further Comments on "Counter-Examples to a Neutralist Hipotesis". J Mol Evol 9:367-368. [ Links ]
KOCH RE (1971) The influence of neighboring base pairs upon base-pair substitution mutation rates. PNAS (USA) 68: 773-776. [ Links ]
LI WH (1997) Molecular Evolution. Sunderland: Sinauer Associates. [ Links ]
MACKWAN RR, CARVER GT, KISSLING GE, DRAKE JW, GROGAN DW (2008) The rate and character of spontaneous mutation in Thermus thermophilus. Genetics 180: 17-25. [ Links ]
NEI M (2005) Selectionism and neutralism in molecular evolution. Mol Biol Evol 22:2318-2342. [ Links ]
NEI M, SUZUKI Y, NOZAWA M (2010) The neutral theory of molecular evolution in the genomic era. Annu Rev Genomics Hum Genet. Jun 21 (on line) PMID:20565254. [ Links ]
OHTA T (1992) The Nearly Neutral Theory of molecular evolution. Annu Rev Ecol Syst 23:263-86 [ Links ]
OHTA T (2002) Near-neutrality in evolution of genes and gene regulation. Proc Nat Acad Sci 99:16134-16137. [ Links ]
PREVOSTI A (2000) La selección natural 30 años después. Memorias de la Real Academia de Ciencias y Artes de Barcelona. Tercera Época N° 964; 58(9): 349-397. [ Links ]
TUPPEN HAL, BLAKELY EL, TURNBULL DM, TAYLOR RW (2010) Mitochondrial DNA mutations and human diseases. Biochim Biophys Acta 1797:113-128. [ Links ]
VALENZUELA CY (1997) Non random DNA evolution. Biol Res 30:117-123. [ Links ]
VALENZUELA CY (2000) Misconceptions and false expectations in neutral evolution. Biol Res 33:187-195. [ Links ]
VALENZUELA CY (2002a) A biotic Big Bang. In: Palyi G, Zucchi C & Caglioti L (eds) Fundamentals of Life. Paris: Elsevier. pp: 197-202. [ Links ]
VALENZUELA CY (2002b) Does biotic life exist?. In: Palyi G, Zucchi C & Caglioti L (eds) Fundamentals of Life. Paris: Elsevier. pp: 331-334. [ Links ]
VALENZUELA CY (2007) Within selection. Rev. Chil. Hist. Nat. 80:109-116. [ Links ]
VALENZUELA CY (2009) Non-random pre-transcriptional evolution in HIV-1. A refutation of the foundational conditions for neutral evolution. Genet Mol Biol 32: 159-169. [ Links ]
VALENZUELA CY (2010a) Internucleotide correlation and nucleotide periodicity in Drosophila mtDNA: New evidence for panselective evolution. Biol Res 43:497-502. [ Links ]
VALENZUELA CY (2010b) Periodicidades e interacciones del DNA. El fin del neutralismo y del casi neutralismo (Textbook, in press) [ Links ]
VALENZUELA CY (2011) Nucleotide Correlation and Periodicity. End of Neutral and Nearly-Neutral Evolution (to be sent). [ Links ]
VALENZUELA CY, SANTOS JL (1996) A model of complete random molecular evolution by recurrent mutation. Biol. Res. 29:203-212. [ Links ]
VALENZUELA CY, FLORES SV, CISTERNAS J (2010) Fixations of the HIV-1 env gene refute neutralism: new evidence for pan-selective evolution. Biol Res 43:149-163. [ Links ]
WRIGHT S (1931) Evolution in Mendelian populations. Genetics 16:97-159. [ Links ]
* Independencia 1027, Casilla 70061, Independencia, Chile. FAX (56-2) 7373158; Phone (56-2) 9786302 E. Mail < firstname.lastname@example.org >
Received: April 14, 2010. In revised form: February 7, 2011. Accepted: March 8, 2011.