Comparative genetic structure in pines : evolutionary and conservation consequences

Pines have been the focus of several studies that estimate population genetic parameters using both allozymes and chloroplast single sequence repeats (SSRs). Also, the genus has also been recently studied using molecular systematics so that we now have a more clear understanding of their evolutionary history. With this background we studied comparatively the genetic structure in pines. Expected heterozygosity is particularly constant with a 99 % confidence interval between 0.19 and 0.23 in species that have been studied until now using allozymes. There is a significant proportion of species (9/41) that show high population differentiation estimates (F ST = or larger than 0.15) and five of these have large and wingless seeds probably associated with low densities, bird dispersal mechanisms and resistance to water stress. These species include the North American pinyon pines. Outcrossing rates are also constant among species from both subgenus Pinus and subgenus Strobus, which probably reflects a selective limit to the amount of deleterious alleles that can be maintained in pine species and this also affects inbreeding levels. We also explored the data published using microsatellites in pines and conclude that these markers uncover a higher proportion of variation and genetic differentiation as expected and that the evolutionary models that are used to derive the population genetic structure estimators should take into account other sources of mutation (point mutations, larger insertions and or deletions and duplications) to better understand the comparative applications of these molecular markers.

shown some generalizations.The first one refers to the inverse relationship between outcrossing rates and genetic differentiation (Govindaraju 1988), so that species of tropical trees or trees in general show an outcrossing rate close to one and genetic differentiation estimates (usually estimated through allelic frequencies variance or F ST ) very close to zero (e.g., Furnier & Adams 1986, Eguiarte et al. 1993, Ledig 1998).
Hamrick and coworkers (e.g., Hamrick & Godt 1990) have made several generalizations between genetic structure and life history characteristics.An extraordinary homogeneity in expected heterozygosity has been observed.This pattern has been explained in light of Ohta's nearly neutral theory of molecular evolution (Ohta 1995).Also, variation in outcrossing rates has shown a bimodal distribution.Schemske & Lande (1985) first described this bimodality.Other generalizations show that trees, perennial plants and widespread species have larger genetic variation.In particular, most pine species have been studied in terms of their genetic structure and the above results have been confirmed in this genus (Ledig 1998).Some exceptions to the above generalizations are represented by species that are rare or that have passed through a bottleneck like Pinus torreyana or Pinus resinosa.Contrary to expectations based on their rarity (but see Comps et al. 2001), studies on the genetic structure on Mexican and North American pinyon pines have shown high genetic variation (heterozygosities between 0.216 and 0.220) except for P. edulis (0.03, Premoli et al. 1994) and a high population differentiation (F ST between 0.18 and 0.25).This genetic structure coincides with the one found in other rare conifer species like Picea chihuahuana (Ledig et al. 1997) and suggests that the outcrossing rate is less than one as found for the few species where it has been estimated (Ledig et al. 1999for P.maximartinezii, Ledig et al. 2001 for P. pinceana).These data can now be studied under a phylogenetic approach.In particular, recent phylogenetic studies of pines using molecular markers (e.g., Liston et al. 1999) can be used as a historical framework to ask questions about the origin and adaptive value of the genetic structure.
Pines represent an interesting system to study the effect of different ecological characteristics on the genetic structure.This genus grows naturally in the northern hemisphere and species can be found as far north as in latitude 71º N (e.g., Pinus sylvestris), in ecosystems where annual rainfall is only 300-400 mm (e.g., P. pinceana), and longevities that reach 2,500 to 5,000 years (e.g., P. aristata and P. longaeva).The great diversity of environments colonized by this ge-nus and its local abundance make pines a key ecological element in most temperate forests of the northern hemisphere.Until now 110 species have been described.Seventy one of those species grow in America and about 50 in Mexico.
This paper presents a comparative analysis of pine genetic structure using allozymes and chloroplast microsatellites.In particular, the analysis is presented using a phylogenetic approach incorporating recent molecular systematic data.Furthermore, we explore the relevance and evolutionary implications of the use of microsatellite as a comparative tool to further our understanding of the causes and patterns of the genetic structure of these trees.

Molecular markers
Different markers have been used in plants and in particular in trees to understand their genetic structure.How well is the genome sampled and how sensitive are they to estimate the genetic variation are the main questions that are usually asked to evaluate their ability to estimate genetic variability.The first answer depends on the way the markers are distributed in the genome and the number of loci that is being used.The second one depends directly on the number of base pairs that are sampled.In pines allozymes have been usually the markers used in most studies.Approximately 50 species have been studied with these markers.In some cases all three basic estimates of population genetic structure have been obtained (expected heterozygosity, genetic differentiation or F ST and outcrossing rates).In all studied species at least an estimate of expected heterozygosity was obtained.Recently, chloroplast microsatellites have been used to study these aspects of the genetic structure and data for at least 10 species have been published.However, no outcrossing rates estimates using nuclear microsatellites have been published.Because of the structure of the data, we restricted our analysis to data on allozymes and chloroplast microsatellites that now represent an important source of information for a comparative analysis.

Allozymes
These markers have provided most of the data that we have now on the genetic structure of plants in general and particularly for trees.The polymorphisms are based on the differential mo-bility on a particular electrophoretic support, usually starch.Allozymes sample more with respect to the number of loci used and their sensitivity is about one fourth of the number of bases sampled.That is, only a base change in four is detected in a gel.For example, a study with 20 allozyme loci samples about (1000 x 20) / 4, or 5,000 nucleotides, assuming that an average gene has 333 amino acids.These are codominant markers.

Microsatellites
The single stranded repeats (SSRs) are DNA sequences that are repeated in tandem a number of times.The larger repeats (until 5 Mb) are called satellites.Intermediate repeats (the repeated unit is more than 10 bp and form blocks of 0.5 to 30 kb) are called minisatellites.Microsatellites have repeated units of 1-8 bp and form structures that have 20 to 100 bp.The number of bases sampled with microsatellites depends on the number of loci used but in general is very low because a region of high variation is sampled and to estimate the same amount of variation a much lower number of base pairs are needed.

Evolutionary models and their assumptions
The genetic estimates of heterozygosity and genetic differentiation depend on the genetic model that is assumed to have occurred during the evolution of the molecular markers.An infinite allele model is usually assumed if allozymes or RFLPs (restriction fragment length polymorphisms) are used but in the case of microsatellite there are good reasons to use a stepwise mutation model based on the way microsatellites are thought to mutate.In particular the model that is thought to be important in these markers is a model based on the slipped-strand misspairing mutation model (Li 1997) that produces an increase or a decrease of a repeat unit (usually a base).Recently many groups have questioned the application of this model based on many factors (Hedrick 1999, Balloux et al. 2000).The first is the possibility of homoplasy or allele convergence.The second refers to the possibility that the mutation mechanism produces alleles that are more than one repeat unit from the original allele.This could happen through insertions or deletions of more than one repeat unit.This process violates the assumptions of the model and makes the estimates biased in the same proportion.As a consequence, both estimates are regularly used, those based on a stepwise mutation model (Ohta & Kimura 1973, Valdés et al. 1993, Slatkin 1995) and on an infinite allele model (based on the estimation model originally developed by Wright [1949] and Kimura & Crow [1964] and further developed by Weir & Cockerham [1988]).
We used a comparative approach to analyze the genetic data on pines.First we used standard statistical tests to compare the estimates (Sokal & Rohlf 1994).Second we grouped the data with respect to their taxonomic status using the classification proposed by Price et al. (1998).Finally we used a published phylogenetic framework for pines (Liston et al. 1999) to further clarify historical patterns in genetic estimates.

Expected heterozygosity and genetic differentiation
Genetic variation estimated through the expected heterozygosity has been obtained for approximately 50 pine species.Ledig (1998) has published a recent review.For 38 species, both heterozygosity and genetic differentiation (using F ST or G ST ) have been published.In all cases allozymes were used as genetic markers.To these data we added some that were not included in Ledig's review (Parker & Hamrick 1996, Delgado et al. 1999, Ledig 1999, Ledig et al. 1999, 2001, Molina-Freaner et al. 2001).These data show a distribution close to a Gaussian curve, but statistically, only the expected heterozygosity adjusts to a normal distribution (Fig. 1).Genetic differentiation does not adjust to such a distribution mainly because there is an excess of observed values close to zero.Averages for the genus are 0.198 for the expected heterozygosity and 0.129 for the genetic differentiation (Table 1).On the other hand genetic differentiation estimates are not significantly different (t 39 = 0.21, P > 0.5) between hard (0.127) and soft pines (0.136).It is noteworthy the high variation among species in both the Pinus and the Ponderosae subsections for the expected estimates of heterozygosity.No significant correlation was detected (r = 0.03, d.f.= 72, P > 0.05) between the expected heterozygosity and genetic differentiation (Fig. 2).This correlation remains low when data are partitioned among soft (r = 0.04, d.f.= 16, P > 0.05) and hard pines(r = 0.07, d.f.= 72, P > 0.05).
When multiple estimates have been published for a particular species, probably a species is better characterized by the largest published estimate due to subsampling of variation.That is, if a study reports a larger F ST estimate, this estimate is probably closer to the species mean as some studies do not have a representative sample of both individuals and populations.The correlation between expected heterozygosity and F ST slightly improves when using these data (r = 0.24, d.f.= 39, P > 0.05) but it is statistically nonsignificant.
When we compared the means of the estimates for the genetic structure for the subsections in Table 1 some generalizations can be made.First, the expected heterozygosities in all subsections of soft pines are larger than the ones obtained for hard pines.Second, genetic differentiation in hard pines is in all cases smaller than the ones estimated for soft pines.In particular P. halepensis, a hard pine, shows an extreme value of 1 %.This species is in subsection Halepenses.The estimate was obtained for populations in Greece (Loukas et al. 1983) and either represents a biased sample for the species or this species deserves future attention being such an extreme example of low population differentiation.E s t i m a t e s o f g e n e t i c v a r i a t i o n u s i n g microsatellites in pines range between 0.411 for P. heidrechii var.leucodermis (Powell et al. 1995) to 0.978 for P. sylvestris (Provan et al. 1998) with an average of 0.582 (Table 2).
On the other hand, estimates of F ST (using the infinite allele model) range between 0.023 for P. pinaster (Ribeiro et al. 2001)  In fact both estimates are statistically correlated (r = 0.985, d.f.= 4, P < 0.01) which suggests that the differences in both models are not important when comparisons are made in these pine species.Furthermore, except for the estimate for P. pinceana, a slightly negative correlation (statistically nonsignificant, r = 0.688, d.f.= 4, P > 0.10) is observed between genetic diversity and genetic differentiation as F ST .
Microsatellites can be used to amplify the same locus in different species.Table 3 shows averages for six species from the subgenus Pinus and subgenus Strobus species (4) for 11 microsatellite loci (Cuenca 2001, Escalante 2001, Delgado, Vendramin & Piñero unpublished results).The average sizes of microsatellites are usually very similar in the two subgenera except in one locus (Pt26081).Also, the variances for the two subgenera are, in general, quite similar

Population inbreeding and mating system
Population inbreeding or in other words the deviation from Hardy-Weinberg equilibrium could have three causes.First, inbreeding could develop through self-fertilization.Secondly, mating could occur among genetically related individuals and this would produce inbreeding.Thirdly, genetic drift could also increase the level of homozygosity.The published estimates of inbreeding (F IS ) in pines range from estimates close to zero to 0.14 for P. pinceana (Ledig et al. 2001).On the other hand, outcrossing rates estimates range from 0.65 for P. maximinoi (Matheson et al. 1989) to values statistically equal to 100 % out-crossing.Average of direct estimations using allozymes for 17 species (six from the subgenus Strobus and 11 from the subgenus Pinus) is 0.88 (SD = 0.10) (Ledig 1998, Ledig et al. 1999, 2001).This distribution is shown in Fig. 3 and fits a normal distribution (Kolmogorov-Smirnov test, d = 0.16, P > 0.20).Confidence intervals (95 %) are rather narrow (0.82 and 0.93).Average estimates for hard and soft pines are not statistically different (0.89 and 0.86 respectively).For species in which a direct estimate of outcrossing rate has been obtained, we can calculate the expected inbreeding coefficient assuming that all inbreeding is due to the mating system (f = [1-t] / [1+t]) and compare both values (Table 4).All inbreeding estimates were obtained in seeds.In general, direct inbreeding estimates coincide with the expected ones based only on the mating system.In some cases like P. sylvestris (Muona & Szmidt 1991) and P. pinceana (Ledig et al. 2001) both estimates are clearly different.This is probably produced by natural selection in favor of heterozygotes or overdominance (see Ledig 1998 for a revision in pines).This effect probably has different intensity in different species and appears to act in earlier stages of the life cycle in some cases.Distribución de frecuencias de la heterocigosidad esperada (arriba) y la diferenciación genética (abajo) en especies de pinos obtenida de datos publicados con marcadores isoenzimáticos.Los datos de diferenciación están multiplicados por 100.

COMPARATIVE GENETIC STRUCTURE IN PINES
Fig. 2: Relation between expected heterozygosity and genetic differentiation in 71 studies in which both estimates were obtained for 38 pine species using allozyme markers.Circles are data for species in the subgenus Pinus (hard pines), squares are data for species in the subgenus Strobus (soft pines).
A historical framework can also be used to compare the mean outcrossing rates in different subsections.In all lineages, rates are in general high.Also, the lowest estimates correspond to P. cembra and P. maximinoi, (0.69 and 0.65, respectively), which belong to different subgenera (Strobus and Pinus, respectively).

Constancy in heterozygosity among species
There are several sources of evidence that suggest that effective population size determines mostly the level of allozyme variation in a species or in other words, that within a species there is a correlation between expected heterozygosity and population size (Avise 1994).In the case of Pinus very few species show estimates of expected heterozygosity different from 0.20.These results are predicted if mutation rates are assumed relatively constant (Hedrick 1999).In particular for Pinus, 99 % confidence limits for expected heterozygosity in 41 species are 0.189-0.227,which suggests that historically effective population sizes have been constant among species or that have been DELGADO ET AL.  maintained above levels at which increases in population sizes would marginally increase the level of genetic variation.
Other aspect of the life history that could affect the level of genetic variation is individual longevity.Although species have been described (particularly in subsection Balfourianae) in which individual longevity could reach thousands of years, it is more common within the range from tens of years to a few hundred years.Longevity could be one of the causes of the data observed, like the level of expected heterozygosity estimated for P. longaeva (0.340, Hiebert & Hamrick 1983).
In this context of relative constancy of estimates for expected heterozygosity, it is relevant to analyze in some detail those species that show a significant deviation from the mean.In general the distribution showed in Fig. 1 shows that those species that have expected heterozygosities be-low 0.1 are very few and all of them (for example, P. torreyana, Ledig & Conkle 1983) are probably the consequence of a bottleneck.This phenomenon has also been described for coulter pine (P.coulteri, Ledig 2000) at the intraspecific level.

Heterogeneity in population differentiation among species
Genetic differentiation shows a contrasting pattern from that found for expected heterozygosity.The distribution does not fit a normal distribution and is probably due to different causes.Theoretically, both migration and genetic drift determine population differentiation.In some cases, attention has been given to the fact that the estimation of genetic differentiation could be biased if the molecular markers show a high mutation rate (Hedrick 1999).Other aspects like the ecological components of migration and genetic drift could also affect the level of population differentiation.These include, for example, population density and migration agents of both gametes and embryos (Alvarez-Buylla et al. 1996).These factors are quite variable in pines and would explain variation in estimates of genetic differentiation.For example pinyon pines in North America share a low population density, relatively isolated populations and seed dispersal by birds.All these factors cause a relatively high population differentiation.

Microsatellites: comparative approaches
Several studies have explored the patterns of microsatellite evolution.Although at the intraspecific level the interpretation of genetic variation is straightforward (slipped-strand misspairing producing one step mutations at high rates), at the interspecific level, particularly when more diver- Inbreeding coefficients (F IS ) and outcrossing rate (t m ) derived from inbreeding estimates (f) for pine species Coeficientes de consanguinidad (F IS ) y estimados de consanguinidad (f) derivados de la tasa de entrecruzamiento (t m ) para diferentes especies de pinos Fig. 3: Frequency distribution of outcrossing rates in pine species from published data using allozymes.
gent species are compared, other factors could be involved.This is because different studies have reported other mechanisms that should be taken into account (Kruglyak et al. 1998, Karhu et al. 2000).Although slipped-strand misspairing was first described as the main mechanism for microsatellite mutation, it has been demonstrated that base substitutions could account for a significant portion of observed substitutions when different species have been compared (Kruglyak et al. 1998).Mutation rates of these two kinds of events differ for various orders of magnitude.While point mutations occur at rates between 10 -10 -10 -8 , mutations in microsatellites that either increase or decrease the number of repeats have been estimated between 10 -5 -10 -2 .In particular, SSRs mutation rates of 10 -5 have been found in pines (Provan et al. 1999).These results show that there must be a difference of about 4 orders of magnitude between the average allelic substitution times for mutations based on the number of repeat copies in a microsatellite and those based on point mutations.This will produce that the transient polymorphisms of both mutation mechanisms will be present in different populations but allele fixation originated from point mutations will occur with higher frequency among divergent species.The comparison of alleles among divergent species has shown this fact (Kahru et al. 2000) but also the existence of a third kind of substitutions (besides size mutations and point mutations), duplications that in highly divergent species will also be present.These mutation mechanisms should be better studied in order to interpret the observed polymorphisms.
The relative proportion of mutation rates with respect to migration rates has been used to explain the incongruence seen in F ST estimates using uniparental and biparental markers, in particular to explain lower F ST 's for uniparental markers (Balloux et al. 2000).In pines, differentiation would be expected to be highest in maternally inherited markers (mitochondria) than in paternally inherited markers (chloroplast) (Furnier & Stine 1995).The lowest differentiation would then be expected in nuclear markers.There are in the literature reports of genetic differentiation u s i n g n u c l e a r a l l o z y m e s a n d c h l o r o p l a s t microsatellites.These data show that there is a positive relation (but statistically non significant, r = 0.86, d.f.= 2, P > 0.10) between genetic differentiation using these markers when nuclear data is used as the independent variable for P. pinaster, P. pinceana, P halepensis and P. heidrechii var.leucodermis (Table 2, Ledig 1998, Ledig et al. 2001).They also show that differentiation estimates using allozymes are always lower t h a n t h o s e o b t a i n e d f r o m c h l o r o p l a s t microsatellites as was also found for Picea glauca by Furnier & Stine (1995).
Results shown in Table 3 support a more complex microsatellite evolutionary model than expected.From a model with genetic drift and mutation, the expected size for microsatellites from different species would diverge at a rate proportional to both mutation rate and population size and as a consequence variance among species would be larger.Microsatellite average sizes for different hard and soft pine species are equal in six of the 11 loci.In the other 5 loci, there is a significant difference but in two of those loci (Pt26081 and Pt87268) is larger in pines from the subgenus Strobus while in the other three the average size is larger in pines from the subgenus Pinus (Pt4821, Pt63718 and Pt71936).The other prediction would be that the variance for both groups of pines taken together would be larger than the variances for each one of the subgenera.Variances are statistically larger at the 5 % limit (Bartlett's test) in these five loci when all species are taken together.This probably shows that size mutation appears to be controlling the evolutionary dynamics of microsatellites at these five loci but it also suggests that other factor should explain both the variance homogeneity and also the homogeneity of the sizes of microsatellites at the other six loci.Selection would explain both homogeneities and further studies should explore this possibility.
The most representative parameter of the genetic structure in plants is the rate of outcrossing, estimated as the proportion of seed produced through self-fertilization with respect to those produced through cross-fertilization (Govindaraju 1988).Mating system has consequences on inbreeding levels and through that on the different factors that produce inbreeding depression.For example, a species with a high outcrossing rate, usually has a high frequency of recessive deleterious and lethal alleles and inbreeding depression will be high if self-fertilization increases.These species will normally have decreasing inbreeding coefficients (an increase in heterozygote proportions) through life stages until adulthood.
Plasticity within some pine species or their ability to modify outcrossing rates (Ledig 1998) is probably due to adaptation to contrasting population densities that favour colonization to new environments in which trees self-fertilize more frequently, as has been described for P. radiata (Bannister 1965cited in Ledig 1998).This flexibility, on the other hand, has not been strong enough as to modify the mating system in such a way that a species would be predominantly selffertilized.In fact mating system in pines is so DELGADO ET AL. open that interspecific hybridization is quite common in sympatric species so that hybridization has frequently been proposed as a speciation mechanism in this genus (e.g., Bucci et al. 1998).

Mating system plasticity and morphological adaptation
One of the main conclusions of this work is that pines show a striking homogeneity in their genetic system that includes homogeneity of expected heterozygosity but most of all nearly constant outcrossing rates for species that diverged 135 million years (see for example Hamrick & Godt 1990).Apparently there has been a strong limitation to the reduction of outcrossing rates in pines and probably those populations or species in which this has happened have probably gone extinct.
These conclusions about the genetic structure probably do not apply to morphological characters.For example, while there is low differentiation in molecular markers (allozymes, RAPDs or microsatellites in P. sylvestris in Finland, Kahru et al. 1996) there is a strong morphological differentiation probably as a result of strong natural selection on morphological characters.
In view of the knowledge on the genus Pinus with respect to their uses, ecological services, morphological adaptations and its conservation status, the group can be considered as model to establish conservation strategies in tree species.Until now, proposals to establish conservation strategies in this genus could be summarized as follows.First, species that show fragmented distributions have higher genetic differentiation and outcrossing rates different from 100 %.North American pinyon pines belong to this group of species and require both in situ and ex situ conservation strategies with emphasis in reforestation with young trees.The second strategy could be generated from a phylogenetic perspective, conservation of abundant tree species is as important as conservation of relictual lineages so that speciation and adaptation processes continue.These conclusions should be put in practice in countries like Mexico with high deforestation rates.Finally, it has become clear that knowledge of comparative genetic structure in this group of trees is an excellent tool to develop strategies for their conservation.

Fig. 1 :
Fig. 1: Frequency distribution of expected heterozygosity (top) and genetic differentiation (bottom) in pine species from published data using allozymes.Differentiation estimates are expressed in percentage.

TABLE 2
Average genetic diversity and genetic differentiation estimates for pine species using chloroplast microsatellite loci (standard deviations are shown in parentheses)

TABLE 3
Mean size (in bp) for different chloroplast microsatellite loci in both subgenera of Pinus.The loci identification corresponds to the notation of the chloroplast sequence of Pinus thunbergii(Wakasugi et al. 1994).Standard deviations are shown in parenthesesTamaño promedio para diferentes loci de microsatélite de cloroplasto en ambos subgéneros de Pinus.Los loci corresponden a la notación de la secuencia del cloroplasto de Pinus thunberghii(Wakasugi et al. 1994).Las desviaciones estándar se muestran entre paréntesis