Finding patterns of distribution for freshwater phytoplankton , zooplankton and fish , by means of parsimony analysis of endemicity

During the last decades, limnological studies on Chilean systems have contributed to know the species composition and main environmental variables of many water bodies distributed over a wide latitudinal interval, from 18o to 53o S. However, we still lack of a comprehensive view about the structure and functioning of regional freshwaters. In this work we review the available information about pelagic biota from Chilean basins, in order to reveal patterns of species distribution and their possible association with environmental variables. We built presence-absence matrices for phytoplankton, zooplankton and fish over lakes and basins. From this database, we performed parsimony analysis of endemicity as a tool for determining fundamental distribution patterns of freshwater biota. Also, we assessed the relationship between species occurrences and some available site-related variables. Our results indicated that latitude exerted the strongest influence on species distribution, although altitude, longitude, and area also exerted significant effects for some groups. On the other hand, our results suggest a relationship between the degree of vagility of the groups and the degree of metacommunity structuring, related to the number of endemicity areas.

stage from the viewpoint of scientific development, and accordingly, most ecologically oriented work on freshwaters habitats has been mainly descriptive and comparative, with modest progress in experimental or theoretical research.
In spite of several decades of valuable descriptive work, regional information about freshwater biota is fragmented and biased, both geographically and taxonomically.As a consequence, we lack of basic knowledge about patterns of distribution and biodiversity, which contrasts sharply with the abundance and variety of freshwater bodies found in the country.Chilean freshwaters exhibit a wide spectrum of physical features associated to strong gradients of climate, lithology, topography and vegetation cover, in agreement with their broad latitudinal range and the strong influences from high Andes and Pacific Ocean.
Here we intend for the first time to review the available information about key components of pelagic biota from Chilean freshwaters, in order to reveal patterns of species distribution and their possible association with environmental variables.
Parsimony analysis of endemicity (PAE) is a powerful biogeographical tool, useful for determining distributional patterns of species and for identifying biodiversity hotspots (Garrafoni et al. 2006).For these reasons, PAE can help for building up a comprehensive view on some aspects of the structure freshwater biota, as well as for guiding conservation planning.Originally developed by Rosen (1988), PAE aims to classify areas by most parsimonious solutions based on the shared presence of species (Nihei 2006).This allows for the identification of areas with non-random distributional congruence among different species (Morrone 1994).
In this work we review the available information about distribution of freshwater phytoplankton, zooplankton, and fish species from mainland Chile.We use the compiled data for finding out basic distribution patterns for the analyzed biota through (a) testing whether the distribution of species across lakes and basins can be statistically explained by the environmental variables at hand, and (b) identifying areas of endemism for each of the groups of interest.

Database
We attempted to review all published information (in both indexed and not-indexed journals) about species distribution of phytoplankton, zooplankton and fish from Chilean freshwaters.We were able to find 41 articles with reliable data, and three additional papers containing environmental information needed for our analyses.The published data spanned from year 1973 to 2006.We studied 55 lakes located between 18.25º and 53.46º S, although most of them concentrate between 32.54º and 46.5º S (Fig. 1).
From this dataset we constructed presence/ absence matrices for each taxon (available upon request), with species as columns and sites as rows.Lakes and basins were treated separately as sites.For phytoplankton (found in 22 lakes), we constructed matrices for the three major taxa Bacillariophyceae (diatoms, 18 lakes), Chlorophyceae (green algae, 18 lakes), and Cyanophyceae (blue-green algae, 16 lakes).For zooplankton (52 lakes), we analyzed separately the three major taxa Cladocera (41 lakes), Copepoda (47 lakes), and Rotifera (31 lakes).Fish were treated as a single taxon and only data by basins were available.
In order to avoid anecdotic records, we did not considered lakes below percentile 10 of species richness, within each taxonomic group.For species names, we maintained the nomenclature contained in the original references.
We also reviewed the available environmental information associated to each site.For lakes, we recorded area, latitude, longitude, and altitude.Most of these data were obtained from the published information.For lakes Pichilafquen, Quillehue, and Chiguay, geographic coordinates were taken from Google Earth (http: //earth.google.com).Surface area for lakes Pichilafquen, Quillehue, Chiguay, Atravesado, Bonita, Huilipilun, La Posada, Patos Bravos, and Lynch were measured with the software Image Tool 3.0 (http: //ddsdx.uthscsa.edu/dig/itdesc.html)from images obtained with Google Earth.
A total of 40 basins were defined according to the official website of Dirección General de Aguas (http: //www.dga.cl),from which we also obtained their surface area and perimeter.For determining the characteristic geographic coordinates of each basin, we followed the following protocol.For exhorreic basins between 22.47º and 48.20º S, basin coordinates were defined as those of the middle point of their main river.For coastal basins between 31.78º and 38.25º S, basin latitude was defined as the mean of those of the two rivers that limit the basin; and longitude was defined as that of the shoreline at the basin latitude.Basins between 42.54º and 53.83º S present a large region of flooded land, and their coordinates correspond to that of the middle point of the entire area.This same procedure was used for basins Altiplanica and Budi.Chilean basins were also grouped into seven hydrographic zones, following official terms (http: // www.igm.cl).Matrices with phytoplankton and zooplankton by basins were constructed from the information of species occurrences in the lakes that belong to each basin.

Multivariate statistics
In order to establish the possible relationship between the presence/absence of several taxa in both Chilean lakes and basins and their environmental variables described below, we performed a multivariate direct gradient analysis, specifically a canonical correspondence analysis (CCA, Ter Braak 1986).CCA extracts continuous axes of variation from species occurrence in the light of known environmental variables (EV), by imposing the constraint that axes are linear combinations of EV.Hence, the relationship between species occurrences and EV are assumed to be linear.In order to determine whether these relationships are statistically significant, we performed Monte Carlo permutation tests with 1,000 runs (Manly 1991).The results of statistically significant CCA are displayed using biplot diagrams of sites (symbols) and EV (vectors).Both CCA and permutation tests were done using the package CANOCO v. 4.5 (Ter Braak & Smilauer 2002).

Parsimony analysis of endemicity
We performed parsimony analysis of endemicity (PAE, Rosen 1988, Morrone 1994) in order to determine possible areas of endemism.The analysis was based on a data matrix with species (characters) as columns and basins as rows (OGU's).The character states were coded as presence/absence (1/0) of the species on sites.For these analyses we used basins instead lakes as sites, in order to minimize the effect of incomplete sampling of species across lakes.A hypothetical basin without any species presence was used for rooting the tree.Data were analyzed with the software PAUP* 4.0b10 (Swofford 2001), using the heuristic search algorithm, randomizing the OGU's entries with 100 replicates.All characters were treated as ordered (Wagner parsimony).The strict consensus tree was calculated, that conserves the most robust grouping of localities and minimizes the influence of widely distributed species (Morrone 1994).A clade defined by at least two species was considered to be an endemicity area (Morrone 1994).We also performed PAE for a matrix that included all species and basins, where a clade defined by at least two species of each taxon was considered to be an endemicity area.

Multivariate statistics
Seven CCA were statistically significant (Monte Carlo permutation test, P < 0.05, Table 1), from a total of 17 studied groups.Total zooplankton, copepods, rotifers, and fish consistently exhibited significant associations with the studied EV.In Fig. 2-4 we show biplots summarizing the relationship between the freshwater biota (symbols, classified according to hydrographic Chilean zones) and EV (vectors).
In general terms, latitude represented the main gradient for all groups considered here.This gradient followed the north-to-south succession through the hydrographic zones, crossing the three main climatic regions of Chile: arid, Mediterranean and temperate.Considering the analyses based on lakes, the second gradient was dominated by altitude, except for rotifers where surface area was the main factor.On the other hand, the second gradient in basins-based analyses was dominated by either longitude (total zooplankton, copepods) or surface area (rotifers, fish).Note that for rotifers, surface area covaried with perimeter of the basin.
Results of CCA accounting for the distribution of total zooplankton on lakes indicated that the two first axes represent 80.6 % of total variance (Fig. 2A).The first axis was mainly correlated with latitude (r = -0.91)whereas second axis was correlated with altitude (r = 0.75).In CCA by basins (Fig. 2B), the first two axes represented 62.5 % of total variance; the first axis was mainly correlated with latitude (r = -0.80)and second axis was correlated with longitude (r = 0.75).
In CCA of copepod distribution on lakes (Fig. 3A), the first axis was mainly correlated with latitude (r = -0.82)and the second axis was correlated with altitude (r = 0.80).In CCA by basins (Fig. 3C), the two first axes represented 70.5 % of total variance.The first axis was mainly correlated with latitude (r = -0.58)whereas second axis was correlated with longitude (r = 0.88).Multivariate analyses testing association of rotifer distribution on lakes and EV (Fig. 3B) showed that the two first axes represent 62.4 % of total variance, where the first axis was mainly correlated with latitude (r = 0.90) while the second axis was correlated with surface area (r = 0.60).In CCA by basins (Fig. 3D), the first two axes represented 60.8 % of total variance; the first axis was mainly correlated with latitude (r = -0.75)and second axis with area (r = -0.44).
Finally, in CCA of fish (Fig. 4) the first axis was mainly correlated with latitude (r = 0.83) whereas second axis was correlated with surface area (r = 0.35).
The analysis of the total information (i.e. a single matrix with all taxa and basins included) did not provided additional information, showing that latitude and area constituted the main driving gradients for the biota analyzed here.

Parsimony analysis of endemicity
Based on the analyzed information, endemicity areas were identified for all taxonomic groups with the exception of blue-green algae.Green algae (Chlorophyceae, Fig. 5A and 8A), exhibited two partially overlapping endemicity areas.The first clade (defined by Closterium acutum and Sphaerocystis schroeteri) is located between 19.913º and 51.247º S, and was formed by three joint basins (AA, AB, AC, see Table A-1 in the appendix) and four disjoint basins (A, N, S, and AL).The second clade defined for green algae (Ankistrodesmus falcatus, Scenedesmus quadricauda and Staurastrum polymorphum) is located between 33.317º and 39.015º S, and comprised four basins, inside which there were two sub-areas of endemicity: L, Z and R, T. For diatoms (Bacillariophyceae, Fig. 5B and 8B) we observed two endemicity areas.The first clade (defined by Aulacoseira granulata, Cocconeis placentula, and Synedra rumpens) comprised three disjoint basins (19.913º and 42.542º S) A, N, and AD.The second clade (Melosira granulata, and Surirella guatimalensis) extends between 33.317º and 51,247º S and was composed by 10 basins (L, R, M, S, T, Z, AB, AC, AL, and AA).
Rotifers exhibited two endemicity areas.The first clade (defined by species Keratella valga and Lepadella ovalis) was built by two adjacent basins (J-K, 32.664º to 32.735º S), while the second (Kellatera cochlearis, Conochilus unicornis, Hexarthra fennica and Polyarthra vulgaris, 33.317º to 42.542º S) was represented by a group of nine basins that included two continuous sub-areas comprising basins L-M-N and the region from basin Z till AC plus the island AD, respectively (Fig. 6A  and 9A).Copepods exhibited three endemicity areas.The first clade (Tumeodiaptomus diabolicus and Tropocyclops prasinus meridionalis ) is a large group of eight basins located between 33.711º and 42.542º S, inside which there was a continuous sub-area composed by basins Z, AA, AB, and AC.A second endemicity area for copepods (Parabroteas sarsii and Eucyclops Serrulatus) locates between 45.313º and 47.468º S and it was defined by the union of AH and AI.The last area (Diaptomus diabolicus and Mesocyclops longisetus) extends from 32.427º to 34.114º and corresponded to the disjoint basins J-L-N (Fig. 6B and 9B).The last zooplankton group, Cladocera, presented a single large endemicity area (defined by species Diaphanosoma chilense and Daphnia ambigua) covering basins A, L, M, N, V, Z, AA, AB, and AD, inside which there was a continuous sub-area composed by basins Z, AA, AB, located between 39.015º and 40.334º S (Fig. 6C and 9C).Resumen de resultados del análisis de correspondencia canónica para peces.Los dos primeros ejes representan 78 % del total de la varianza.
australis, Percichthys melanops.In the same vein, the third and fourth endemicity areas would disappear.
The analysis of the total information (all taxa included) showed four endemicity areas.A first area appeared in all groups analyzed (basins Z-AD).The second area (L, N) was already defined for plankton groups, the area M, S appeared for green algae, and the fourth area AL was defined for green algae, diatoms, and copepods.The number of retained trees was 7717, with a length of 819.Consistency Index of this tree was 0.37, and Retention Index was 0.46.
Our results revealed that basin AB (Bueno), N (Rapel), Z (Toltén), AA (Valdivia), and L (Coasta Aconcagua-Maipo) form part of endemicity areas for all groups studied here, excluding blue-green algae.Basins AC (Basin/ Island Bueno-Puelo), and M (Maipo) form part of endemicity areas for five out of six groups (excluding blue-green algae).Cyanophyceae did not present endemicity areas.For each group, symbols were associated to endemicity areas (denoted by brackets), same that will be depicted and described in Fig. 8.

DISCUSSION
Our CCA results revealed that neither phytoplankton nor cladocerans exhibit a distribution trend that could be explained by the environmental variables considered here.On the other hand, latitude was the main factor explaining species distribution of total zooplankton, copepods, rotifers, and fish.Altitude/longitude constituted also an important gradient for total zooplankton and copepods, while lake surface did so for rotifers.Since this work only assessed general distributional trends of a regional freshwater biota, mechanisms generating the observed patterns are still unknown.Nevertheless, previous works emphasize that phytoplankton distribution is often driven by dispersal, while distribution of fish is driven by vicariance.This relates to the relative vagility of the two groups, with zooplankton locating midway between phytoplankton and fish.This agrees with the lack of significance in CCA results for algae.
Our results from endemicity analyses show that highly vagile groups due to their passive dispersal mechanisms such as algae, rotifers and cladocerans, exhibit at most two endemicity areas over the studied region.Intermediate levels of vagility associated to copepods agree with the finding of intermediate number of endemicity areas for this group, which are distributed along a latitudinal gradient.Fish exhibit the minimal relative vagility, and consistently they show the largest number of endemicity areas along the northsouth gradient.
As a cautionary note, consider that some endemicity areas may have been defined by species that, in global sense, are widely distributed.Nevertheless, species that define endemicity areas within a region have a restricted area therein.
influence the degree of metacommunity structure.Previous studies also support latitudinal patterns of distribution.For example, Henio (2001) found latitudinal gradient in species distribution of macrophytes, beetles, dragonflies, and stoneflies in the north hemisphere; Chengalath & Koste (1989, see also Segers 1996) found latitudinal variation in distribution patterns of rotifers in the circumpolar region; and Weckstroèm & Korhola (2001) found similar trends for diatoms in Fennoscandia.
In spite of the fact that dispersal abilities of organisms, particularly of planktonic ones, are important determinants of their current distribution (Fenchel et al. 1997, Finlay 2002, Fenchel & Finlay 2004), establishment on a given site strongly depends on the success of species in the front of abiotic constraints and natural enemies, as well as on their ability for exploiting local resources.On the other hand, new species can also appear by speciation mechanisms other than vicariance (Maidana et al. 2005, Pedrós-Alió 2006).In this way, local physical and chemical conditions of waters bodies such as light penetration, pH, nutrients availability, and lake morphometry among others, might exert significant effects on species richness and distribution (Gutiérrez-Aguirre & Suárez-Morales 2001, Duggan et al. 2002).Unfortunately, we lack of such information for the bulk of systems analyzed here, and consequently we were not able to test their influence.Nevertheless our results may serve as a start point for future research on metacommunity structure of regional freshwaters, as well as for focal hypothesis testing.Latitude is associated with climate and soil composition, which partially determine basin and lake morphometry, water chemistry, and temperature.Organisms' biology influences their life histories as well as dispersal abilities and hence colonization probabilities.At the same time, environmental features interacting with biological traits influence the structure and dynamics of populations and communities, which in turn affects local persistence.Therefore, from the rough relationships between freshwater biota and environmental variables shown here, future research on local systems could advance towards revealing the actual ecological mechanisms driving distributional patterns of species occurrences.Finally, our analysis also identified geographic areas of endemicity for each of the groups considered, and the species that define them.These results should be considered for decision making in the field of natural reserve design and conservation planning.
APPENDIX 1 Lakes and basins analyzed, with their corresponding labels, hydrographic zone (I to VII from north to south), and references (*) Lagos y cuencas analizados, con su correspondiente rótulo, zona hidrográfica (I a VII de norte a sur), y referencias

Fig. 1 :
Fig. 1: Map of mainland Chile, indicating the 55 lakes used in the analyses.Mapa de Chile continental, indicando los 55 lagos usados en los análisis.

Fig 3 :
Fig 3: Summary results of canonical correspondence analysis for copepods (A and C) and rotifers (B and D).(A) Copepods by lakes, the two first axes represent 74.2 % of total variance; (B) rotifers by lakes, the two first axes represent 62.4 % of total variance; (C) copepods by basins, the two first axes represent 70.5 % of total variance; (D) rotifers by basins, the two first axes represent 60.8 % of total variance.Note that the perimeter and the area are autocorrelated in (D).Resumen de resultados de análisis de correspondencia canónica para copépodos (A y C) y rotíferos (B y D). (A) Copépodos por lagos, los dos primeros ejes representan 74,2 % del total de la varianza; (B) rotíferos por lagos, los dos primeros ejes representan el 62,4 % del total de la varianza; (C) copépodos por cuencas, los dos primeros ejes representan 70,5 % de la varianza; (D) rotíferos por cuencas, los dos primeros ejes representan el 60,8 % del total de la varianza.Note que el perímetro y el área están autocorrelacionados en (D).

Four
Fig. 4: Summary results of canonical correspondence analysis for fish.The first two axes represent 78 % of total variance.

TABLE 1 Monte
Carlo's test of significance of all canonical axes with 1,000 permutations.Canonical correspondence analysis made from both lakes and basins along the continental Chile; NS indicates P > 0.05 Prueba de significación de Monte Carlo para todos los ejes canónicos con 1.000 permutaciones.Análisis de correspondencia canónica realizado para lagos y cuencas a lo largo de Chile continental; NS indica P > 0,05