INTRODUCTION

In daily practice, professionals know that their work is not based on the use of statistics, mainly because they are accustomed to working with people in a one-on-one setting, as opposed to with a collection of individuals. This circumstance risks limiting these professionals from developing research in their area. It is important to highlight that researchers, having some notion of statistical procedures, can understand and manage data variability which is often produced by "uncertainty" and "uncertainties" which are often health problem consequences. Statistics are important in part because they support academic research at different levels and subsequently generate rigorous knowledge for the benefit of society. Currently, there is a growing interest in statistical knowledge because many authors are convinced that statistics must be understood through inductive and deductive reasoning. Statistical education is not based solely on formulas and figures, but also interpretation of data so that decisions can be applied correctly.

The nature of the present study is qualitative descriptive, wherein variable relationships with respective statistical procedures were analyzed and described based on narrative context. Results are generally relevant and applicative, since the research informed about statistical procedures proposed in dental morphology, to support the professional improvement of researchers in biostatistics.

Statistics is a branch of science that deals with the collection, organisation, analysis of data and drawing of inferences from the samples to the whole population. This requires a proper design of the study, an appropriate selection of the study simple and choice of a suitable statistical test. An adequate knowledge of statistics is necessary for proper designing of an epidemiological study or a clinical trial. Improper statistical methods may result in erroneous conclusions which may lead to unethical practice (^{Ali & Bhaskar, 2016}).

Statistical procedure corresponds to these steps:

a) Data collection.

b) Statistical data analysis.

c) Presentation.

Quantitative or qualitative information described in research is gathered during data collection. There are various ways to collect data, but it is important to emphasize that this can be performed by simple observation or through other procedures, which are often complex depending on the proposed design.

Statistical data analysis comprises two main statistical methods: descriptive and inferential analysis. The first, more widely known, consists of describing a data set (collect data, present data in tabular or graphical form, summarize measurements, etc.) to interpret the behavior of variables. Inferential analysis consists of applying certain statistical techniques, to generalize or infer results in the analysis population.

Research is at times mistaken for gathering information, documenting facts, and rummaging for information (^{Leedy & Ormrod, 1974}). Research is the process of collecting, analyzing, and interpreting data in order to understand a phenomenon (Leedy & Ormrod). The research process is systematic in that, defining the objective, managing the data, and communicating the findings occur within established frameworks and in accordance with existing guidelines. The frameworks and guidelines provide researchers with an indication of what to include in the research, how to perform the research, and what types of inferences are probable based on the data collected.

Research originates with at least one question about one phenomenon of interest. For example, what competencies might inhibit or enhance the accession of women into senior leadership positions? Or what leadership factors might influence the retention choices of registered nurses? Research questions, such as the two preceding questions, help researchers to focus thoughts, manage efforts, and choose the appropriate approach, or perspective from which to make sense of each phenomenon of interest (^{Williams, 2007}).

The three common approaches to conducting research are quantitative, qualitative, and mixed methods. The researcher anticipates the type of data needed to respond to the research question. For instance, is numerical, textural, or both numerical and textural data needed? Based on this assessment, the researcher selects one of the three approaches to conduct research. Researchers typically select the quantitative approach to respond to research questions requiring numerical data, the qualitative approach for research questions requiring textural data, and the mixed methods approach for research questions requiring both numerical and textural data (Leedy & Ormrod).

MATERIAL AND METHOD

This study corresponds to a cross-sectional descriptive-observational design. We reviewed and collected information through documentary analysis, addressing aspects in statistical test development and applications in dentistry. We used unpublished data corresponding to studies carried out between 2014 and 2017 by researchers at the Center for Research in Odontological Sciences (CICO).

RESULTS

Descriptive Statistics. This procedure describes and indicates preliminary data behavior. Data are presented in tables and graphs, by which central tendency measures, also called summary measures, are calculated for post analysis.

Within this type of analysis, it is essential to consider the following procedures: data presentation, frequency distribution tables and summary measures (mean, median, mode, quartiles, range, variance, standard deviation and variation coefficients).

In this context, variables also play an important role, defined as those characteristics that can be taken by different observations, and are not necessarily numerical. Some examples of non-numerical variables are sex, treatment type, gingival recession, etc. Variables can also be classified as qualitative, corresponding to the attributes or qualities of study subjects, and quantitative, representing numerical values of study subject characteristics. Qualitative variables can be classified as nominal, or those whose category does not have a pre-established order (sex, marital status, vital status, etc.) On the other hand, qualitative variables can be classified in ordinals, with pre-established orders (educational level, patient disease, etc.). Quantitative variables can be classified as continuous or discrete. Continuous quantitative variables are those that take infinite values within a range of numbers (weight, height, etc.), whereas discrete quantitative variables take a finite number of values within a range of numbers (number of cavities, number of patients treated, etc.).

Descriptive measures. The following are descriptive measures, to be used depending on variable type and study objectives:

- Qualitative variables: Rate, ratio and proportion.

- Quantitative variables: Central tendency (averages, modes, medians) and variability (range, variance, standard deviation, variation coefficients) measures.

Statistical inference. Quantitative research in health sciences involves the collection of data from different sources.This data includes demographic variables (e.g. age, gender), biological variables (e.g. weight, blood pressure), risk factors for disease (e.g. smoking status, obesity), outcome variables (e.g. survival data, length of hospital stay), etc. The purpose of statistical analysis is to process this ‘raw’ data into an organized form so that it provides the required information in summary (descriptive statistics). The inferential statistical analysis makes generalizations about the population based on a ‘representative sample’ taken from the study population; it also compares the results between different subgroups of the data sampled to determine any difference or association between the predictor (independent) and outcome (dependent) variables based on the objectives/hypotheses of the study. The purpose of inferential statistics is to determine with a degree of confidence whether the observed differences are statistically significant or may be due to chance alone using the P-value/ confidence interval (^{Omair, 2014}).

Inferential Statistics are used to deduce or obtain generalizations regarding population parameters, accounting for the information obtained from samples. A parameter is a numerical characteristic of one or more populations (population mean, variance, proportion, etc.). In addition, it should be noted that there is a marked division in statistical techniques.

Parametric and non parametric statistics

Parametric Statistics. We can freely say that most people who use statistics are more familiar with parametric than nonparametric techniques. Parametric tests are based on the assumption that the data follow a normal or “bell-shaped” distribution. Parametric methods are often those for which we know that the population is approximately normal, or we can approximate using a normal distribution after we invoke the Central Limit Theorem. There are two parameters for a normal distribution: the mean and the standard deviation. Parametric tests are usually appropriate when examining either interval data or ratio data. Altman states that “parametric methods require the observations within each group to have an approximately Normal distribution if the data do not satisfy these conditions a nonparametric method should be used”.

According to the Central Limit Theorem, when the sample size is larger than 30, normality is not a main condition for a standard t (Student) or z hypothesis test: even though the individual values within a sample might follow an unknown, non-normal distribution, the simple means (as long as the sample sizes are at least 30 will follow a normal distribution (^{Stojanovic et al., 2018}).

Significance tests. Null hypotheses "H0" (no significant differences between samples), and alternative hypotheses "H1" (significant differences between samples) are always considered in statistical significance tests. More information is obtained when null hypotheses can be rejected, meaning that statistics vary among different samples, with a probability higher than 95 %. If the null hypothesis cannot be rejected (p> 0.05), information is lost because it cannot be said that samples are the same or different because probability is less than 95 %.

Student t test. In most biomedical research, investigators hypothesize about the relationships of various factors, collect data to test those relationships, and try to draw conclusions about those relationships from the data collected. In many cases, investigators test relationships by comparing the average level of a factor between 2 groups or between 1 group and a standard reference.

In some research projects, the study design includes only a single sample, and the goal may be to determine whether the outcome measure for the population from which the sample was drawn has same mean as some standard population. Determining an appropriate standard for comparison for these designs is often an issue. Nonetheless, when well-established standards exist, investigators may wish to use these standards for maximal comparability. In this situation, we might perform a 1-sample (not1-sided) t test (^{Davis & Mukamal, 2006}).

Example: (unpublished data)

Is lingual cortical thickness in Class I skeletal patients different from 0? (Table I).

It can be concluded with a significance of 5 % that lingual cortical thickness is different from 0 (p ≤ 0.001).

Example: (unpublished data)

Is lingual cortical thickness in skeletal Class I patients different from 2.5 mm? (Table II).

It can be concluded with a significance of 5 % that lingual cortical thickness is approximately 2.5 mm (p = 0.199).

Student's t-test for independent samples. The independent t-test, also called the two sample t-test or student's t-test, is an inferential statistical test that determines whether there is a statistically significant difference between the means in two unrelated samples on the same continuous, dependent variable. The test also asks whether a difference between two samples averages is unlikely to have occurred because of random chance in sample selection. A difference is more likely to be meaningful and “real” if (i) the difference between the averages is large, (ii) the sample size is large, and (iii) responses are consistently close to the average values and not widely spread out (the standard deviation is low). Note that before performing any Independent t-Test the following assumptions must be satisfied:

i Independence: Observations within each sample must be independent (they don’t influence each other)

ii Normal Distribution: The scores in each population must be normally distributed

iii Homogeneity of Variance: The two populations must have equal variances (the degree to which the distributions are spread out is approximately equal) (^{Usman, 2016}).

Example: (unpublished data)

Is there a statistically significant difference between Class I and Class III lower cortical thickness? (Table III).

It can be concluded with a significance of 5 % that there is no significant difference between Class I and Class III lower cortical thickness (p = 0.931).

Student t-Test for related samples. Dependent sample t- test, sometimes called the paired sample t-test is used when the observations in the two populations of interest are collected in pairs .Two samples are dependent (or consist of matched pairs) if the members of one sample can be used to determine the members of the other sample. Words such as dependent, repeated, before and after, matched pairs, paired and so on are hints for dependent samples. According to University of Arizona Military Reach (2009), dependent Samples t-test is used to compare two groups of scores and their means in which the participants in one group are somehow meaningfully related to the participants in the other group. One common example of such a relation is in a pretest post-test research design. Participants at the pre-test are the same participants at the post-test and the scores between pre- and post-test are meaningfully related. What it means is that the scores between pre- and post-test are dependent on each other (^{Gerald, 2018}).

Example: (unpublished data) (Table IV). When applying the t-test for related samples, only significant differences were observed between right and left canines in men (p = 0.008*). If related according to arch sides, no significant differences were observed in women.

One-way Analysis of Variance. Analysis of variance (ANOVA) is one of the most frequently used statistical methods in medical research. The need for ANOVA arises from the error of alpha level inflation, which increases Type 1 error probability (false positive) and is caused by multiple comparisons. ANOVA uses the statistic F, which is the ratio of between and within group variances. The main interest of analysis is focused on the differences of group means; however, ANOVA focuses on the difference of variances. The illustrated figures would serve as a suitable guide to understand how ANOVA determines the mean difference problems by using between and within group variance differences. The differences in the means of two groups that are mutually independent and satisfy both the normality and equal variance assumptions can be obtained by comparing them using a Student’s t-test. However, we may have to determine whether differences exist in the means of 3 or more groups. Most readers are already aware of the fact that the most common analytical method for this is the one-way analysis of variance (ANOVA) (^{Kim, 2017}).

Example: (unpublished data)

One-way ANOVA was applied for a factor to see if significant differences existed between measurements (condyles and distances) according to age groups, for which it could be observed that there were no significant differences: 1) 11 to 28 years; 2) 29 to 45 years; 3) 46 to 62

years; 4) 63 to 79 years. (Table V).

Non parametric statistics

When data fails to fulfil the assumptions of a parametric test, researchers opt for non-parametric tests given the fact that they are less restrictive. The test can also be used for small sample size of <30. Therefore, non-parametric tests are used when the variables are non-metric. Although non-parametric tests do not require restrictive statistical assumptions, it is recommendable that the sample obtains a criteria that is; computing the same aggregates for instance, mode, median, range. Furthermore, non-parametric tests also assume that the variables are measured on a nominal or ordinal scale. The tests can further be categorised depending on whether one or two samples are involved (^{Kataike et al., 2017}).

Pearson’s Chi Squared (C2) test. The Chi-square statistic is a non-parametric (distribution free) tool designed to analyze group differences when the dependent variable is measured at a nominal level. Like all non-parametric statistics, the Chi-square is robust with respect to the distribution of the data. Specifically, it does not require equality of variances among the study groups or homoscedasticity in the data. It permits evaluation of both dichotomous independent variables, and of multiple group studies. Unlike many other non-parametric and some parametric statistics, the calculations needed to compute the Chi-square provide considerable information about how each of the groups performed in the study. This richness of detail allows the researcher to understand the results and thus to derive more detailed information from this statistic than from many others. The Chi-square is a significance statistic and should be followed with a strength statistic. The Cramer’s V is the most common strength test used to test the data when a significant Chi-square result has been obtained. Advantages of the Chi-square include its robustness with respect to distribution of the data, its ease of computation, the detailed information that can be derived from the test, its use in studies for which parametric assumptions cannot be met, and its flexibility in handling data from both two group and multiple group studies. Limitations include its sample size requirements, difficulty of interpretation when there are large numbers of categories (20 or more) in the independent or dependent variables, and tendency of the Cramer’s V to produce relative low correlation measures, even for highly significant results (^{McHugh, 2013}).

Example: (unpublished data). The Pearson Chi-Square test was performed to determine the relationship between sex (male, female) and facial cone (triangular, quadrilateral). No association was indicated (p = 0.240) (Table VI).

Mann-Whitney U test (for two independent samples). Mann-Whitney U test is a non-parametric statistical technique. It is used to analyze differences between the medians of two data sets. It can be used in place of a t-test for independent samples in cases where the values within the sample do not follow the normal or t-distribution but also when the distribution of values is unknown. In order for the Mann-Whitney U test to be applied, values need to be measurable on an ordinary scale and comparable in size. The fact that all values are compared makes it distinct from the t-test, which compares the sample means. The Mann- Whitney U is also used to test the null hypothesis, subject to both samples coming from the same basic set or having the same median value (^{Milenovic, 2011}).

Example: (unpublished data). We aimed to determine if there were significant differences in distances between mesiobuccal canal 1 and mesiobuccal canal 2 according to sex (male, female). After performing the normality test, we observed that samples did not behave normally (Table VII).

We performed the Mann-Whitney U test for independent samples, indicating that the distances between mesiobuccal canal 1 and mesiobuccal canal 2, do not vary significantly between males and females (p = 0.979) (Table VIII).

Kruskal-Wallis test (for three or more independent samples). The Kruskal-Wallis test is a nonparametric technique with which to analyze the variance. In other words, it analyzes whether there is a difference in the median values of three or more independent samples. The Kruskal-Wallis test is similar to the Mann-Whitney test in that it ranks the original data values. That is, it collects all data instances from the samples and ranks them in increasing order. If two scores are equal, it uses the average of the two ranks to be given (^{Nahm, 2016}).

Example: (unpublished data).

We aimed to determine if significant differences exist between nasal index (known magnitude measuring nose width for classification according to size; a quotient or relation, which classifies certain subraces or members, to gain knowledge) and skeletal class (class I, class II and class III). It was observed that samples did not behave normally by performing the normality test, for which the Kruskal- Wallis test was applied, indicating significant differences between nasal index and skeletal classes (p ≤0.001) (Table IX).

Wilcoxon signed-rank test. Wilcoxon signed rank test is a rank based alternative to the parametric t test that assumes only that the distribution of differences within pairs be symmetric without requiring normality, the Wilcoxon signed rank test is used to test the null hypothesis that the median of a distribution is equal to some value and can be used in place of a one sample t-test, a paired t-test or for ordered categorical data where a numerical scale is inappropriate but where it is possible to rank the observations. To use the Wilcoxon signed rank sum test, we first find the difference between the observation and the hypothesized median in the one sample problem or the difference between the paired observations in the paired sample problems. We then take the absolute values of these differences and rank them, either from the smallest to the largest, or from the largest to the smallest, always taking note of the ranks of the absolute values with positive differences and those with negative differences. The requirement that the populations from which the samples are drawn are continuous, makes it possible to state at least theoretically, that the probability of obtaining zero differences or tied absolute values of the differences is zero (^{Cochran, 1950}).

Example: (unpublished data). We aimed to determine significant differences between hyperplastic and non-hyperplastic condyle width measurements, for which the Shapiro-Wilk normality test was performed (p = 0.027). Samples did not meet the normality test, therefore the nonparametric Wicoxon test for related samples was applied (p = 0.552), indicating no differences between measurements (Table X).

Cochran’s Q test. In statistics, in the analysis of two-way randomized block designs where the response variable can take only two possible outcomes (coded as 0 and 1), Cochran's Q test is a non-parametric statistical test to verify whether k treatments have identical effects. It is named after William Gemmell Cochran. Cochran's Q test should not be confused with Cochran's C test, which is a variance outlier test. Put in simple technical terms, Cochran's Q test requires that there only be a binary response (e.g. success/failure or 1/0) and that there be more than 2 groups of the same size. The test assesses whether the proportion of successes is the same between groups. Often it is used to assess if different observers of the same phenomenon have consistent results (interobserver variability) (^{Shoukri, 2004}).

Example : (unpublished data). We compared three treatments for Bruxism (T1, T2 and T3). Fourteen patients were randomized and effectiveness (result one) and non-effectiveness (zero result) were determined for each treatment (Table XI).

Example 2: (unpublished data). In our case, it has a significance of 0.001; at less than 0.05 the null hypothesis is rejected. Answers in the three treatments are different.

Friedman’s test. The Friedman test provides another method for testing to detect a shift in location of a set of ?? populations. Like other non-parametric tests, it does not require any assumptions about populations. Consider a situation where a complete randomized design was used to compare the reaction times of subjects under the influence of one or two drugs. When the effect of the drug is short-lined and when the drug effect varies greatly from person to person it may be beneficial to employ a randomized block design. Using the subjects as blocks, we would hope to enhance the variability among subjects and thereby increase the amount of information in the experiment (^{Friday et al., 2019}).

Example: (unpublished data). We used Friedman's test to compare differences in pain levels for post-operated patients who underwent orthognathic surgery, for which follow-up measurements were taken at 3, 5 and 7 days post operation; pain levels were measured according to the following scale: none, low, medium, strong and very strong (Table XII).

According to results, with a significance of 0.478, there is no statistical evidence to reject the null hypothesis, therefore postoperative pain levels are not different on the third, fifth and seventh days.

DISCUSSION

All health care professionals and medical researchers face the challenge of keeping abreast of a body of knowledge that is expanding at an astonishing rate. The current views on the causes, mechanisms, and treatment methods of diseases are advancing too rapidly for any physician or researcher to achieve personal experience with all the new findings. This has led to a growing reliance on the published literature to learn about new discoveries that can ultimately influence diagnostic evaluations, therapeutic decisions and public health guidelines.

An important function of any medical research journal is the effective dissemination of new findings to its target audience. To be an effective consumer, a journal reader should be familiar with the methodological aspects, especially when the techniques, such as statistical procedures, are invoked to clarify findings or summarize raw data. Statistical methods play an important role in medical publications. This is reflected in the high proportion of articles that are essentially statistical in character. Most papers published in medical journals contain some element of statistical methods, analysis and interpretation (^{Horton & Switzer, 2005}). Statistical review has also become an important and integral part of the editorial process (^{Greenwood & Freeman, 2015}).

Because of an increasing dependence on the medical literature, it is essential to include statistical education in medical, dental and health care (undergraduate and postgraduate) training as part of the essential topics to support understanding of new research findings. Additionally, clinicians and graduated readers of medical journals should know the frequency with which various statistical concepts are reported in journals that are important to their sub-fields. This helps readers to identify the major statistical skills needed to critically evaluate their literature. Those responsible for training future practitioners and researchers to invest their resources most efficiently should ask the following questions: How often are various statistical techniques reported in the journals of a specific sub-field? Which statistical methods are mentioned most often in their journals compared to more visible journals? Do readers of clinical versus basic science journals need different statistical expertise? Has the use of statistical techniques changed over time or are there new methods that are currently applied more often? (^{Nieminen et al., 2017}).

Therefore, there is a specific protocol or procedure for processing and presenting study results, included in certain designs of appropriate statistical procedures.

Statistic methods have experienced significant growth in recent years with applications in research design strategies and statistic analysis methods, which are more sensitive to different designs. Guides and/or manuals have been developed to guide new statistical procedures and present various methodological issues specific to those designs. Applications of more specific statistical techniques and new software to complement various research designs are increasing.

This work aims to consolidate statistical knowledge and provide an updated approach to statistical testing, which is important and necessary for data management in morphology and dentistry. On the other hand, we present adequate statistical testing options for different design types, providing the possibility of developing research ideas, in the shortest possible time, as well as a data analysis by means of more exact techniques essential to obtain results of a required quality that comply with adequate standards. In addition, our testing methods facilitate the presentation of results in journals, thereby securing a place for investigations which contribute to Dental research.

"After all, high statistics are the numerical expression of common sense" (Karl Pearson).