Discrepancy variation of dinucleotide microsatellite repeats in eukaryotic genomes

To address whether there are differences of variation among repeat motif types and among taxonomic groups, we present here an analysis of variation and correlation of dinucleotide microsatellite repeats in eukaryotic genomes. Ten taxonomic groups were compared, those being primates, mammalia (excluding primates and rodentia), rodentia, birds, fish, amphibians and reptiles, insects, molluscs, plants and fungi, respectively. The data used in the analysis is from the literature published in the Journal of Molecular Ecology Notes. Analysis of variation reveals that there are no significant differences between AC and AG repeat motif types. Moreover, the number of alleles correlates positively with the copy number in both AG and AC repeats. Similar conclusions can be obtained from each taxonomic group. These results strongly suggest that the increase of SSR variation is almost linear with the increase of the copy number of each repeat motif. As well, the results suggest that the variability of SSR in the genomes of low-ranking species seem to be more than that of high-ranking species, excluding primates and fungi. Key terms: Microsatellites, Eukaryotic genomes, Variation of SSR, Copy number, Meta-analysis. Corresponding author: College of Fisheries and Life Science, Shanghai Ocean University, No 999 Huckerg Ring Rd, Lingang New City, Shanghai, 201306, Tel.: +86-518-85895252, Email slcai@shou.edu.cn Received: January 23, 2009. In Revised form: June 25, 2009. Accepted: July 7, 2009. INTRODUCTION Microsatellites, also called simple sequence repeats (SSRs) or short tandem repeats (STRs), are a class of repetitive DNA sequences widespread in all of eukaryotic genomes (Tautz & Renz, 1984; Aishwarya, 2007) and most prokaryote genomes (Sreenu et al, 2003). The typical construction of SSRs consists of 1-6 short motifs repeating many times. It is widely accepted that short protomicrosatellites, which arise by chance in the genome, are expanded by replication slippage, a mutation process which is specific to microsatellite DNA(Schlötterer & Harr, 2000). Slippage mutations occur during DNA replication by displacement of the nascent strand, which subsequently realigns out of register. If DNA synthesis continues on this misplaced DNA molecule, the repeat number of the microsatellite is altered (Tautz & Schlötterer, 1994). SSRs show a high degree of length polymorphism among individuals, and the variability degree in array length can be detected by the number of alleles. If the above theory of slippage-mutationrecombination is right, we can assume that the abundance of some of the SSR repeat types in genomes should reflect the mutation ability of corresponding repeat types, i.e. the more abundant, the higher is the mutation rate. Additionally, we also conclude that over long macroevolutionary time periods, there must be significant differences in the aspect of the SSR mutation rate among species, particularly between superior creature and inferior creature. In fact, among major taxonomic Biol Res 42: 365-375, 2009 GAO ET AL. Biol Res 42, 2009, 365-375 366 groups, microsatellites exhibit considerable variation in composition and allele length, but they also show considerable conservation within many major groups (Nikitina & Nazarenko, 2004; Grover et al, 2007; Gao & Kong, 2005; Edwards et al, 1998; Zhang et al, 2004). This variation may be explained by slow microsatellite evolution, so that all species within a group have similar patterns of variation, or by taxon-specific mutational or selective constraints (Ross et al, 2003). However, comparing and studying microsatellites across species may be problematic because of the biases that may exist among different isolation and analysis protocols and similar studies have so far been limited to some species belonging to the same genus (Ross et al, 2003) or individuals of the same species (Brandström & Ellegren, 2008). Nevertheless, through macro-data analysis and setting controls, the influence of ascertainment bias in SSR mutation comparison across taxanomic groups can be decreased to the maximum degree. By analyzing the relationships between SSR motifs and their variation, we hope to address the questions of whether the variation of various microsatellite motifs is similar in genomes of different taxa and how the variation of SSR differs among repeat motif types. The results are expected to be useful for understanding the causes and consequences of genome evolution at microsatellite loci. MATERIALS AND METHODS


INTRODUCTION
Microsatellites, also called simple sequence repeats (SSRs) or short tandem repeats (STRs), are a class of repetitive DNA sequences widespread in all of eukaryotic genomes (Tautz & Renz, 1984;Aishwarya, 2007) and most prokaryote genomes (Sreenu et al, 2003).The typical construction of SSRs consists of 1-6 short motifs repeating many times.It is widely accepted that short protomicrosatellites, which arise by chance in the genome, are expanded by replication slippage, a mutation process which is specific to microsatellite DNA (Schlötterer & Harr, 2000).Slippage mutations occur during DNA replication by displacement of the nascent strand, which subsequently realigns out of register.If DNA synthesis continues on this misplaced DNA molecule, the repeat number of the microsatellite is altered (Tautz & Schlötterer, 1994).SSRs show a high degree of length polymorphism among individuals, and the variability degree in array length can be detected by the number of alleles.
If the above theory of slippage-mutationrecombination is right, we can assume that the abundance of some of the SSR repeat types in genomes should reflect the mutation ability of corresponding repeat types, i.e. the more abundant, the higher is the mutation rate.Additionally, we also conclude that over long macroevolutionary time periods, there must be significant differences in the aspect of the SSR mutation rate among species, particularly between superior creature and inferior creature.In fact, among major taxonomic groups, microsatellites exhibit considerable variation in composition and allele length, but they also show considerable conservation within many major groups (Nikitina & Nazarenko, 2004;Grover et al, 2007;Gao & Kong, 2005;Edwards et al, 1998;Zhang et al, 2004).This variation may be explained by slow microsatellite evolution, so that all species within a group have similar patterns of variation, or by taxon-specific mutational or selective constraints (Ross et al, 2003).However, comparing and studying microsatellites across species may be problematic because of the biases that may exist among different isolation and analysis protocols and similar studies have so far been limited to some species belonging to the same genus (Ross et al, 2003) or individuals of the same species (Brandström & Ellegren, 2008).Nevertheless, through macro-data analysis and setting controls, the influence of ascertainment bias in SSR mutation comparison across taxanomic groups can be decreased to the maximum degree.
By analyzing the relationships between SSR motifs and their variation, we hope to address the questions of whether the variation of various microsatellite motifs is similar in genomes of different taxa and how the variation of SSR differs among repeat motif types.The results are expected to be useful for understanding the causes and consequences of genome evolution at microsatellite loci.

Source of data
The microsatellite data (including repeat motifs and their corresponding number of alleles) were obtained from the papers published in the Journal of Molecular Ecology Notes (now renamed Molecular Ecology Resources) from 2001 to 2007.Mainly, there are two methods for obtaining SSR loci.One of the methods, mainly to screen loci of SSR, is to first establish the SSR-rich genome libraries, then to obtain the original sequences by sequencing the DNA clones, subsequently design the SSR primer pairs and test the polymophism of SSR loci using PCR technology among different individuals.The other method is to directly design the SSR primer pairs from known-sequences, such as EST sequences or some other known DNA sequences filled with SSR repeats.By the electrophoresis of PCR products in gels, the number of SSR alleles can be confirmed by calculating the number of amplified bands located at the expected size ranges of PCR amplification products.Hence, the resolving power of gel electrophoresis determines the number of alleles.In the published articles of Molecular Ecology Notes, there are three kinds of methods for detecting PCR products.The most frequently used method, and perhaps the most accurate, is that PCR amplification products are separated on an automated DNA sequencer, and alleles are sized with software, such as Gene Profiler (Scanalytics, Fairfax, VA, USA), Genescan software (ABI Prism) and Genotyper (ABI Prism) etc.
The second most frequently used method is that the PCR products are run on 6%-8% denaturing polyacrylamide gels stained with silver nitrate, and the length of the PCR products is estimated with 10-100 bp ladder markers.Additionally, PCR products may be separated by 2%-3% agarose gels, and visualized with UV light after ethidium bromide staining or by autoradiography.Considering the accuracy for the number of alleles, only data obtained from the first method is used to analyse the variation discrepancy of dinucleotide microsatellite repeats in this paper.Moreover, only counts of over 20 individuals were considered as valid data to be used in the statistic analysis.
The information of repeat motifs and their copy number, the number of alleles and their sequence sizes were inputs into the Excel sheets of Microsoft Office according to the classification of taxa.

Statistical criterion
Both perfect and imperfect repeats (only one motif, excluding the compound type) (Webber, 1990) of dinucleotide repeat types were considered to study the variation of SSR.Considering base mutual partnership and the order difference of recording the first base of repeat units, similar repeat types could be sorted into a single type.For example, for dinucleotide repeat type, AT includes AT and TA; AC includes AC, CA, TG and GT; AG includes AG, GA, TC and CT; CG includes CG and GC respectively (Kong & Gao, 2005).
The taxonomic groups were as follows: primates, mammalia (excluding both primates and rodentia), rodentia, birds, fish, amphibians and reptiles, insects, molluscs, plants and fungi.In Whittaker's classification system (1969), Kingdom Fungi, Kingdom Plantae and Kingdom Animalia are parelleled.In Kingdom Animalia, the highest-ranking taxanomic group is Mammalia, including Primates, Artiodactyls and Rodentia et al, in which primates is the highest ranking evolutionary species.The others are respectively birds, amphibians and reptiles, fish, molluscs and insects in rank-decreasing order.
In order to decrease the error in which extremely low values of the number of alleles in some loci might affect the accuracy of the analysis, the detection of extreme values was carried out for each taxonomic group, using the following methods (Wang & Yang, 2005): after excluding the doubtable value (x d ) of the number of alleles, calculate the δ was calculated, the doubtable value, x d , would be discarded.
The comparative analysis of mutation slopes between perfect and imperfect SSRs was performed using covariance analysis.Analysis between different repeat motif types and copy numbers was done by correlation analysis.SSR analysis among taxonomic groups was completed by variance analysis.All of the above statistical analyses were performed by SPSS 14.0 software (SPSS Inc., Chicago, IL).

The categorization and component of repeats in total
In total, 4,960 dinucleotide microsatellite repeats were found in 674 organisms, in which the number of insects ( 144) is the largest, accounting for 21.36% of total organisms, and the others in decreasing order are plants, fish, birds, amphibians and reptiles, rodentia, mammalia, molluscs, fungi and primates (in table 1).The number of perfect and imperfect repeats was 4,212, representing 84.92% of the total of 4,960 dinucleotide repeats.
In dinucleotide repeats, the AC, 2833, accounting for 67.26% of total 4212 dinuleotide repeats, was the most abundant, and the second was AG, 1333, occupying 31.65% of total dinucleotide repeats.However, the number of AT and CG was rare, especially for CG, only 4 (in table 2).Notes: Pr, Ma, R, B, F, AR, I, Mo, Pl, and Fu is respectively referred to as Primates, Mammalia, Rodentia, Birds, Fishes, Amphibians and Reptiles, Insects, Molluscs, Plants and Fungis, and the same as below; di-is the dinucleotide repeats, and the same as below. ∑ In each taxonomic group, the extreme value of number of alleles was tested and deleted from the corresponding data set.From table 2, we found the criterion of extreme value is quiet different among different taxonomic groups, for example, it is ≥11 in the fungi, while it is ≥38 in molluscs.Thus, the number of repeats deleted was different in different taxonomic groups, which can be seen in table 2.

The comparison of perfect and imperfect SSR mutation slopes
According to Webber's criterion (Webber, 1990), SSR can be classified into three types: perfect, imperfect and compound.The origin mechanism of compound type SSR is more complicatied than the perfect and imperfect types, which involve point mutation based on existing SSR repeats and subsequent slippage mutation (Kong & Gao, 2005).Thus, only perfect and imperfect repeats were considered to compare the variation of SSR across species.
Nevertheless, there might be differences between perfect and imperfect repeat types in the aspect of mutation rates, such as AG being similar to AG in another species that may not be dinucleotid identical by descent.
If the slopes of the regression equation, in which the independent variable and dependent variable are the copy number of repeat motifs and the number of alleles respectively, are identical for both perfect and imperfect repeat types in one given taxanomic group, the mutation rates for both repeat types are expected to be identical.Otherwise, perfect and imperfect repeats must be considered to describe the variation character of SSR.
Using SPSS and regarding the copy number as an independent variable, covariance analysis showed that the slope difference between perfect and imperfect repeats were all of no significance (P>0.05)(in table 3).Though no comparison was made in other species because data was insufficent (for example, less than 10 imperfect AC repeats in primates), the above results indicate that the mutation rate for perfect and imperfect repeats is identical.Thus, the data of perfect and imperfect repeats belonging to the same repeats were incorporated to analyze the variation character of dinucleotide repeats in the following text.

Relationships between number of alleles and repeat motif types
Here, only two kinds of repeats, AG and AC, were used to analyze the relationship between repeat motif types and their corresponding variation differences.The number of AT and CG repeats was too low to satisfy the requirement of data analysis.For example, only 10 AT repeats were found in fungi, and limited to only six kinds of copy number classes.The arithmetic mean of the number of alleles was regarded as the number of alleles corresponding to each copy number (in Table 4).An independsamples t-test was done to compare the means of number of alleles between AC and AG repeat types.Levene's Test for equality of variances showed that the variances of samples for AC and AG were equal (F=1.988, and P=0.163>0.05).By means of the two-tailed test, no significant differences were found between the two repeat types (P=0.091>0.05).Additionally, the variation comparisons between AC and AG repeats were also done in the following taxonomic groups respectively: fish, amphibian and reptiles, insects, molluscs, plants and fungi.
The results showed that there were no significant differences between AC and AG repeats among above taxonomic groups.

Variation analysis among taxonomic groups
The copy number and the corresponding number of alleles in each taxonomic group for AG and AC repeats can be seen in table 5. Regarding taxonomic groups and the number of alleles as the fixed factor and dependent variable, respectively, a one-way ANOVA was completed to study the variation characteristics of AC and AG among taxa.The results show that there are significant differences among taxa for both of AC (F=8.304,P=0.000) and AG (F=17.804,P=0.000).In the case that the error of variance of the dependent variable was equal across groups by the Levene's test of equality of error variances, the method of Dunnett's T3 was used to complete the multiple comparisons among different taxonomic groups in both AC and AG repeats.For AC repeats, the comparison results that yielded significant differences are shown in table 6.In decreasing order, the mean of number of alleles were molluscs, fish, insects, amphibians and reptiles, plants, rodentia, birds, primates, mammallia and fungi.From table 6, we can see that fungi was almost significantly different from the other taxonomic groups, excluding mammallia.
For AG repeats, the significant differences in the results of multiple comparisons are shown in table 7. Here, only six taxonomic groups are compared to each other, i.e. fish, amphibians and reptiles, insects, molluscs, plants and fungi.In decreasing order, the mean of number of alleles were molluscs, fish, plants, insects, amphibian and reptiles, and fungi.As with the AC repeats, fungi was also almost significantly different from the other taxonomic groups, excluding amphibians and reptiles.It was worth noting that the mean of number of alleles in molluscs had the highest value in both AC and AG repeats.

The correlation between the number of alleles and copy number in each taxonomic group
With the help of SPSS, the correlation analysis between the number of alleles and copy number (the original data is in table 5), was performed in order to explore whether there are differences among different organisms.The results are shown in table 8. From table 8 we can see that the number of alleles correlates to the copy number in almost all of taxonomic groups, excluding primates and fungi.However, a significant correlation relationship existed only in molluscs, plants and fungi for AG repeats.
Our statistical data (in table 2) is similar to what the above literature reported.Thus, the data used in our analysis is reliable, because it agrees with the natural distribution characteristics of dinucleotede reports in eukaryotic genomes.
It is worth noting that AT repeats are usually very abundant in many taxonomic groups, such as human, plants, fungi (Tóth et al, 2000), shrimps (Kong & Gao, 2005), insects (Subirana & Messeguer, 2008;Prasad et al, 2005) etc.However, the number of AT repeats is scarce in the investigated taxonomic groups.The paucity of AT repeats might result from the isolated-SSR methods in which the (AC) n and (AG) n probes are used largely to screen the DNA fragment libraries (An et al, 2006;Funk et al, 2006).Nevertheless, the paucity of CG repeats resulted not only from the isolated-SSR methods in which the (CG) n probes were scarcely used, but also their low contents in most of organism genomes, such as shrimps (Kong & Gao, 2005), fish (Edwards et al, 1998) et al.
The longer the repeat sequences, the higher the mutation rates Microsatellite mutation rates range from 10 - 6 to 10 -2 per generation, and are thus significantly higher than base substitution rates (Schlötterer, 2000).Several potential factors that may contribute to microsatellite mutation rates in the evolutionary dynamics of microsatellites have been suggested: sequence of the repeat motif, repeat number, length of the repeat unit, flanking sequence, interruptions in the The results of correlation analysis between the number of alleles and copy number microsatellite, recombination rate, transcription rate, etc (Schlötterer, 2000).
As yet, some comparisons among different repeat types about mutation rates have been done, for example, the mutation rate of dinucleotide repeats was higher than that of tetranucleotede repeats (Chakraborty et al. 1997).On the contrary, a recent study showed the mean population mutation rates of microsatellites do not significantly differ in motifs of di-, tri-, and tetra-nucleotide repeats for rice (Gao & Xu, 2008).However, the variability of SSR among repeat motifs, such as between AT and AG repeats, are not yet fully understood among taxonomic groups, although Bachtrog et al's studies (2000) have shown that GT/CA microsatellites have a mutation rate 1.4 times as high as that of TC/AG and 3.0 times as high as that of AT/TA microsatellites from Drosophila melanogaster.Our results revealed there were no significant differences between AC and AG repeats in total.Moreover, the differences between AC and AG repeats were not significant in the following investigated taxonomic groups: fish, amphibians and reptiles, insects, molluscs, plants and fungi.
Although no significant differences for the variability of SSR between AG and AC repeat motif types were found, there were prominent positive correlations between copy number and number of alleles for AC (Pearson correlation coefficient is 0.858, and P=0.00) and AG (Pearson correlation coefficient is 0.704 and P=0.00) motifs.Many previous studies, including direct observations from pedigrees (Schlötterer et al, 1998), population surveys (Goldstein & Clark 1995;Gao & Xu, 2008) and analysis of cloned microsatellites (Wierdl et al. 1997), have suggested that the mutation rate of microsatellite increases with the increase of repeat number.Here, we directly used number of alleles as the scale appraising the SSR variation, and arrived to the same conclusion.The results provided stronger support that the mutation rate of SSR increases with an increase in the copy number of repeat motifs.
Furthermore, the increase in the number of alleles is linear with the copy number of repeats (at least in the range of 5-39 copy numbers), which can be seen in figure 1.The results support the strand-slippage theories that are used to elucidate the genesis and development of SSR (Levinson & Gutman, 1987), since longer repeat sequences would result in more mutations.However, the growth of a microsatellite sequence was not infinite, and the evidence indicates that it would be prevented by the accumulation of base substitutions and point mutations (Kruglyak et al, 1998;Kruglyak et al, 2000).This viewpoint may explain why the correlation is not significant for AC repeats between number of alleles and the copy number in fungi and primates, where fungi belong to an inferior creature and primate belongs to the highest creature in the taxonomic category.For primates, the mutation rates of SSRs have held at a relatively steady level over long evolutionary time periods, however, the shorter repeats in fungi limit rapid mutation of SSR (Schlötterer et al, 2000).

Relationships between variation of SSR and evolution of genomes
Many studies (Grover et al, 2007;Tóth et al, 2000;Mrázek et al, 2007;Katti et al, 2001) have elucidated that the component and distribution of microsatellite repeats is varied in different genomes of organisms.These results reveal that SSR evolution in genomes might be closely related to the evolutionary rank of species.Schlötterer (2000) thought that species with short microsatellites, such as D. melanogaster, should have a lower microsatellite mutation rate than species with longer microsatellites.The hypothesis can be supported by the data of fungi in our studies, in which the microsatellite sequences are shorter than in other taxanomic groups.From table 6 and table 7, we also find that fungi is almost significantly different from the other taxonomic groups, excluding the mammallia, for AC repeats, and amphibians and reptiles for AG repeats.
Excluding fungi, it seems that the variability of SSR in the genomes of lowranking species is more than that of highranking species.For example, the mean of the number of alleles for AC repeats in decreasing order were molluscs, fish, insects, amphibians and reptiles, plants, rodentia, birds, primates and mammallia; for AG repeats, they are molluscs, fish, plants, insects and amphibian and reptiles, in turn.An alternative explanation is that the genome sizes of low-ranking species is usually less than that of high-ranking species, and the length of microsatellites in low-ranking species is thus correspondingly less than that of high-ranking species.According to Lai and Sun's research(2003), when slippage mutations happen, the expansion of short microsatellite repeats occurs more frequently than long repeats, and the contraction of long microsatellite repeats contrarily occurs more frequently than short repeats.Thus, the variability of shorter SSRs in genomes would be more than that of longer SSRs.
Of course, the differences of SSR variation observed among taxonomic groups can result from a number of factors.For example, each taxonomc group includes many species, and there are also differences among related species (Schlötterer & Harr, 2000).Additionally, there are also differences in the aspects of age of the phylum, old or recent diversification, evolution of DNA repair systems, mating systems and so on, among species.A further and detailed study is apparently needed to validate the hypothesis, i.e. excluding fungi, it seems that the variability of SSR in the genomes of low-ranking species is more than that of high-ranking species.

Figure 1 :
Figure 1: The increase in allele numbers following the increase of copy numbers

TABLE 1
The distribution and component of dinucleotide repeats among taxonomic groups

TABLE 2
The distribution characterization of dinucleotide repeat motif types EV is the abbreviation of Extreme Value, and is the unnormal number of alleles.

TABLE 7 The
Note: * The mean difference is significant at the .05level.