versión impresa ISSN 0716-9760
Biol. Res. v.37 n.4 supl.A Santiago 2004
Biol Res 37: 777-782, 2004
Analysis of rhythmic variance - ANORVA. A new simple method for detecting rhythms in biological time series
Peter Celec (1, 2, 3, 4)
1 Institute of Pathophysiology, Faculty of Medicine, Comenius University, Bratislava, Slovakia
Cyclic variations of variables are ubiquitous in biomedical science. A number of methods for detecting rhythms have been developed, but they are often difficult to interpret. A simple procedure for detecting cyclic variations in biological time series and quantification of their probability is presented here. Analysis of rhythmic variance (ANORVA) is based on the premise that the variance in groups of data from rhythmic variables is low when a time distance of one period exists between the data entries. A detailed stepwise calculation is presented including data entry and preparation, variance calculating, and difference testing. An example for the application of the procedure is provided, and a real dataset of the number of papers published per day in January 2003 using selected keywords is compared to randomized datasets. Randomized datasets show no cyclic variations. The number of papers published daily, however, shows a clear and significant (p<0.03) circaseptan (period of 7 days) rhythm, probably of social origin.
Key terms: chronobiology, time series, method, ANORVA, analysis of rhythmic variance, circaseptan, cycle, rhythm.
Cycles and periodicities can be found nearly everywhere in nature. Nevertheless, chronobiology still has the reputation of something strange and curious. Most rhythms in biomedicine can be classified according to their period lengths into circadian (period ~ 24 h), ultradian (period < 24 h), and infradian (period > 24 h). The best known is the circadian rhythm, e.g. the sleep-wake cycle. Its genetic basis and the interactions between the internal Zeitgeber (pacemaker) and external influences are well understood. Interested readers can find comprehensive information in a number of recent reviews (2; 4; 9; 10; 15; 21). An example for ultradian cycles is the variation of gonadotropin releasing hormone levels (8). Infradian rhythms include circannual variations and the circalunar menstrual cycle in women (13; 16). In a narrowinterpretation, infradian characterizes the cycles with shorter periods than the circalunar cycle. These infradian rhythms are rarely described. One possible reason for this is the number of methodological problems regarding cycle detection (7; 17). A growing number of statistical methods exist that should make the search for cycles in datasets easier (9; 11; 17; 18; 20). On the other hand, the variety of statistical procedures used indicates that there is no clear answer regarding which method is correct. I would like to add some entropy to this chaos and present a new method that is easy to use and understand for detecting periodicities in biological time series.
Most methods are very difficult to understand and use without specific commercial software. Moreover, the complexity of the procedures makes it difficult to accurately interpret the results, particularly when considering that every method has its own characteristics and requires specific data conditions. My method is based on the premise that groups of values from cycling time series have a low variance (theoretically zero) if the time distance between the data entries is one period of the cycle. That is why I call the method Analysis of Rhythmic Variance (ANORVA). The detailed step-by-step procedure is presented here so it can be reproduced and used by other scientists.
Step 1 _ Data entry and preparation
The minimum number of subjects/objects or datasets is 5. The quantity of observed data in every dataset determines the provable period lengths. Minimum provable period length (Min PPL) is triple the time distance between the two closest data entries in one subject/object, and the maximum provable period length (Max PPL) is half of the whole minimum observation period. Therefore, when a dataset consisting of daily measurements for 30 days, possible period lengths between 3 (3x1) and 15 (30/2) days can be proven. Randomized number sets represent the negative control used for the comparison with the real data. To show how this method works, two negative controls are used here and compared to each other. The randomized datasets should have the same dimensions as the proven variable. The positive control used was the number of papers published daily in January 2003 using selected keywords according to the PubMed Medline database. See Figure 1 for details.
Datasets must be full; no blank data entries are feasible. If no other possibility exists, entries should be filled using interpolation and averages of the data surrounding the blank entry. Datasets are not required to be of the same extent in every subject, although similar extents are desirable. Values that are clearly different from the rest should be excluded if they are affected by external influence of a known non-cyclic factor.
The values of January 1-2, 2003 were excluded as they were higher by order of magnitude in comparison with other data entries, probably due to the accumulation of the papers over the Christmas holidays. As this exclusion affects the size of the tested negative control dataset, the randomized datasets must be cut off in the same way.
If needed, the original data must be transformed to achieve normal distribution in biological data, often by the log-normal transformation. Data must be transformed into a non-dimensional variable by dividing the data entries through their average value. Thus, the average value of the new dataset must be 1. Moreover, if a significant trend is present, the time series must be detrended using the results of the relevant regression analysis.
Step 2 _ Variance calculating
The process begins with the Min PPL, calculating the variance of a group consisting of data entries with the Min PPL time distance between them. The same is repeated for the remaining phase shifts until the first phase shift is reached.
The first group of data will consist of our example of the 1, 4, 7, 10, , 3n+1 data entry. The second group (next phase shift) will consist of the 2, 5, 8, 11, , 3n+2. With 3 as the Min PPL, the third group (3, 6, 9, 12, , 3n) is the last group.
The average variance is calculated from the results of all phase shiftsand is repeated with every subject/object. The entire procedure is repeated for every period length between Min PPL and Max PPL.
Subjects/objects are represented in our example by the keywords. With Min PPL=3, the average variance can be calculated from 3 variances that respond to 3 possible phase shifts. This must be repeated with every keyword. Testing for the 4-day period is the same except that there are 4 possible phase shifts (4n+1, 4n+2, 4n+3 and 4n). Other periods are treated in the same way with an increasing number of possible phase shifts up to the Max PPL=15.
To obtain a similar dispersion of the results in randomized and real datasets, the resulting mean variances must be divided by the average variance and the dynamic weight of the results increased. A similar calculation was done in Step 1. Repeat the entire procedure with the randomized dataset. The results are the dynamic weighted average variances for phase shifts of every provable period for every subjects/object.
Step 3 _ Testing the difference
If n is the number of subjects/objects in the dataset, then for every tested period p there is a set of n dynamic weighted average variances for real data and n dynamic weighted average variances for randomized data. These two new groups of values must be compared using a simple Student t-test or a non-parametric test such as the Mann-Whitney U test. The t-test is preferred due to its statistical power and robustness; the one-tailed t-test is used here, although the choice of test type used in this step is not crucial for ANORVA. The resulting p-value is a quantification of the probability that a possible observed cycle is based only on coincidence. Thus, a low p-value, particularly one lower than 0.05, indicates a low periodic variance and a high probability for the existence of a cycle with the given period length in the proved dataset. Calculating p-values for every period makes it possible to compare the probabilities of the potential cycles and to find a cycle with a period length between Min PPL and Max PPL.
The process of calculating the p-values is simplified by using the built-in functions of office software packages; sophisticated statistical software is not necessary for these simple calculations, but an adjustment for multiplicity in testing, such as the Bonferroni correction, is needed to control the type I error rate. The resulting p-values are shown in graphs (Fig. 2). The randomized dataset is clearly free of cyclic variations, data on the papers published daily show an apparent and statistically significant rhythm of 7 days. This circaseptan cycle is the best-known rhythm of social origin (1, 6, 15). I believe that this cycle can be explained by the variations of publications due to the work-free weekend days, and indeed, the lowest number of publications coincides with Sundays.
It should be noted that further extensions of the method can be also used for synchronization of data and thus for preparing periodograms with an average cycle curve. The procedure is far more complicated, however, and beyond the scope of this article. The example contains synchronized data. The sole quantitative detection of rhythmic variations using ANORVA is independent from the de-synchronization of the subjects/objects as all possible phase shifts are analyzed.
The aim of this article was to offer a simple presentation of the basic principles and use of a new method for detecting cyclic variations in biological time series. Although other methods do exist, such as the widely-used Cosinor method and specific procedures based on Fast Fourier transformations (9; 11; 14;), they are difficult to interpret because they are dependent upon the morphology of the cycle (comparison to sinus wave) or have particular problems with noisy data. ANORVA is straightforward and easy to use and is independent of the morphology of the cycle curve. Its low specificity improves with the higher dataset sizes, but it can also handle small datasets.
An earlier study indicating the possibilities of the basic principle of ANORVA was the first to describe the circavigintan (period of 20 days) and circatrigintan (period of 30 days) cycles of salivary testosterone levels in men (5). Other studies on hormonal variations found in our lab using ANORVA have been submitted and are currently under consideration. Hopefully, the currently complicated chronobiological analysis will soon be easily understood and reproduced by interested scientists. Chronobiological research is a growing area of research throughout the world (19), and a simple method for the detection of periodical components of biological datasets is now more necessary than ever before.
Of course the ANORVA method must still be subjected to an objective evaluation, a detailed comparison with other methods, and have its weaknesses traced. However, this is a long-term process that requires time and feedback from other groups with experience using ANORVA on other datasets. Problems may occur with the number of randomized datasets, the choice of arithmetical rather than geometrical or other averages, or sensitivity toward hidden cycles. Identifying the method's limitations will help improve it and make it more universal, which is one of the reasons for providing a detailed description of the method's relatively simple calculation steps here. I believe that this clear and simple explanation will prove more helpful to readers than a complicated manuscript that could obscure more than it shows.
I would like to express my appreciation to my mentors Assoc. Prof. Daniela Ostatnikova, MD, PhD and Assoc. Prof. Bozena Chovancova, Ing, PhD for the support, to my co-workers Julius Hodosy, MD and Rastislav Kostka, Ing for the contribution to this manuscript, to Katarina Melisova, Ing, Michaela Prihodova, MD and Ingrid Jurkovicova, MD for the inspiration and especially to Matus Kudela, MSc for the fruitful discussions and sustained criticism, which were most important in the development of ANORVA. This work was supported by Grant of the Comenius University No. 116/2004.
1. AHLERS I, AHLERSOVA E, SMAJDA B, TOROPILA M, DATELINKA I (1988) Circaseptan rhythm of corticosterone and lipids in the serum of male rats. Physiol Bohemoslov 37: 49-55 [ Links ]
2. ALBRECHT U (2002) Invited review: regulation of mammalian circadian clock genes. J Appl Physiol 92: 1348-1355 [ Links ]
3. BITTER RA, HOOTMAN J, NIELSEN TW (1973) Fast Fourier transform analysis of the circadian rhythm of urinary 11-hydroxycorticosteroids in laboratory rats exposed to high environmental pressures of helium and oxygen. Aerosp Med 44: 1261-1263 [ Links ]
4. BUNNEY WE, BUNNEY BG (2000) Molecular clock genes in man and lower animals: possible implications for circadian abnormalities in depression. Neuropsychopharmacol 22: 335-345 [ Links ]
[ Links ]
6. DE PRINS J (1996) The problematic cosinor (Article in French). Pathol Biol (Paris) 44: 201-204 [ Links ]
7. DELYUKOV A, GORGO Y, CORNELISSEN G, OTSUKA K, HALBERG F (2001) Infradian, notably circaseptan testable feed sidewards among chronomes of the ECG and air temperature and pressure. Biomed Pharmacother 55 Suppl 1: 84s-92s [ Links ]
8. FILICORI M, FLAMIGNI C, CAMPANIELLO E, FERRARI P, MERIGGIOLA MC, MICHELACCI L, PARESCHI A, VALDISERRI A (1989) Evidence for a specific role of GnRH pulse frequency in the control of the human menstrual cycle. Am J Physiol 257: E930-E936 [ Links ]
9. FUKADA Y, OKANO T (2002) Circadian clock system in the pineal gland. Mol Neurobiol 25: 19-30 [ Links ]
10. GUIDO ME, CARPENTIERI AR, GARBARINO-PICO E (2002) Circadian phototransduction and the regulation of biological rhythms. Neurochem Res 27: 1473-1489 [ Links ]
11. HALBERG F (1970) Frequency spectra and cosinor for evaluating circadian rhythms in rodent data and in man during Gemini and Vostok flights. Life Sci Space Res 8: 188-214 [ Links ]
12. HECHT K, CORNELISSEN G, FIETZE I, KATINAS G, HEROLD M, HALBERG F (2002) Circaseptan aspects of self-assessed sleep protocols covering 70 nights on 33 clinically healthy persons. Percept Mot Skills 95: 258-266 [ Links ]13. LUTOSLAWSKA G, TKACZYK J, PANCZENKO-KRESOWSKA B, HUBNER-WOZNIAK E, SKIERSKA E, GAJEWSKI AK (2003) Plasma TBARS, blood GSH concentrations, and erythrocyte antioxidant enzyme activities in regularly menstruating women with ovulatory and anovulatory menstrual cycles. Clin Chim Acta 331: 159-163 [ Links ]
14. MIKULECKY M (1994) In defense of proper cosinor analysis. Chronobiologia 21: 331-335 [ Links ]
15. OKAMURA H, YAMAGUCHI S, YAGITA K (2002) Molecular machinery of the circadian clock in mammals. Cell Tissue Res 309: 47 -56 [ Links ]
16. OSTATNIKOVA D, PUTZ Z, MATEJKA P, ORSZAGH M (1995) The variation of the level of salivary testosterone in men and women during a year (in Slovak). Prakt Gynek 2: 4 5-48 [ Links ]
17. RAO AV, SHARMA VK (2002) A simple approach for the computation of multiple periodicities in biological time series. Biol Rhythm Res 33: 487-502 [ Links ]
18. REFINETTI R (1991) Use of chi square periodogram in the analysis of estrous rhythmicity. Int J Biomed Comput 27: 125-132 [ Links ]
19. RICHTER HG, TORRES-FARFAN C, ROJAS-GARCIA PP, CAMPINO C, TORREALBA F, SERON-FERRE M (2004) The circadian timing system: making sense of day/night gene expression. Biol Res. 37: 11-28 [ Links ]
20. SOKOLOVE PG, BUSHELL WN (1978) The chi square periodogram: its utility for analysis of circadian rhythms. J Theor Biol 72: 131-160 [ Links ]
21. STANEWSKY R (2003) Genetic analysis of the circadian system in Drosophila melanogaster and mammals. J Neurobiol 54: 111-147 [ Links ]
Corresponding Author: Peter Celec, MD, Ing., B.Sc., Galbavého 3, 841 01 Bratislava, Slovakia, Telephone: (421) 903-507 822, E-mail: firstname.lastname@example.org
Received: June 16, 2004. In revised form: July 12, 2004. Acepted: August 18, 2004.