INTRODUCTION

Forests provide numerous ecosystem services, such as regulation of biogeochemical cycles, pollution control and food supply. Among the most acclaimed ecosystem services provided by forests are the atmospheric carbon (CO_{2}) sequestration and its storage (^{Canadell and Raupach 2008}, ^{Chazdon et al. 2016}). This service is of strategic importance in mitigating ongoing climate change because it acts directly in controlling global warming (^{Bonan 2008}).

In this context, the quantification of the carbon stock present in the most varied types of forests constitutes an important tool for monitoring this ecosystem service (^{Scolforo et al. 2015}). Many initiatives have been taken to quantify carbon stocks in forests, both by direct means (^{Dantas et al. 2021}) and through estimates from related data (indirect methods) (^{Cordeiro et al. 2018}).

Carbon stock estimation by indirect methods employs modeling and simulation techniques. Historically, modeling of forest attributes has relied on approaches based on statistical models (*e.g.*^{Melo et al. 2017}). These approaches share space today with computational approaches of artificial intelligence/machine learning, such as artificial neural networks, support vector machines, decision trees, among others, which have been gaining space as tools for forest data analyses, modeling, estimation and production prognosis. These tools have provided gains in the quality of estimates and predictions (^{Vendruscolo et al. 2015}).

Artificial Neural Network (ANN) is a processor consisting of simple processing units (artificial neurons), based on neurons found in the human brain, that calculate certain functions. These units are layered and connected to each other by weights that store experimental knowledge and weight the inputs of each unit. Thus, the acquired knowledge becomes available for use (^{Braga et al. 2007}, ^{Dantas et al. 2020}a).

The most notable features in ANNs are their ability to learn and to generalize information. In other words, ANNs are able, through a learned example, to generalize assimilated knowledge to an unknown data set. Another interesting feature of ANN is the ability to extract non-explicit features from a set of information provided as examples (^{Haykin 2001}).

Support vector machines (SVM) have also proven to be an interesting alternative for mathematical modeling of complex systems (^{Heddam and Kisi 2018}). They are simple techniques in their conceptual basis and capable of solving extremely complex real problems. SVM is a supervised learning technique that is trained to classify different categories of data from various disciplines (^{Haykin 2001}). These have been used for two-class classification problems and are applicable on both linear and non-linear data classification tasks. SVM creates a hyperplane or multiple hyperplanes in a high-dimensional space, and the best hyperplane in them is the one that optimally divides data into different classes with the largest separation among the classes (^{Steinwart and Christmann 2008}).

Initially, SVM techniques were successfully applied as a data classification methodology (^{Tong and Koller 2001}). They were later extended to regression tasks through the following approaches: support vector regression (SVR) and least-square support vector machines (LS-SVM) (^{Cherkassky and Mulier 1998}, ^{Dantas 2020}b).

Compared to ANN, SVM has the advantage of leading to an exact solution, that is, a global optimum (^{Haykin 2001}). However, finding a final SVM model may present computational complexity because it requires solving a quadratic programming model and solving a set of nonlinear equations.

The present study aims at evaluating the performance of the support vector machine and artificial neural network techniques, and at proposing a new nonlinear model to the modeling of above ground biomass (carbon stock), using dendrometric variables as inputs, in a secondary semideciduous seasonal forest. It is proposed, as a hypothesis, that (a) machine learning techniques are suitable in modeling above ground biomass, (b) it is possible to extract accurate above ground biomass equations from the artificial neural network training process.

METHODS

*Study area and data collection*

The study area corresponds to a secondary semideciduous seasonal forest, located in Lavras, Minas Gerais, Brazil, under the coordinates 21° 14' S and 45° 00' W, with average altitude of 900 m (figure 1). The climate is classified as Köppen's Cwb, with dry winters and mild summers (^{Alvares et al. 2013}). Mean annual rainfall is around 1,500 mm and mean annual temperature is 19.4 °C (^{Marques et al. 2019}). The forest is heterogeneous and presents dominance of tree species of the genus *Anadenanthera*, popularly known as “angico”. Data come from 105 sample plots (10x10 m) launched in the area. In each plot, all trees with diameter at breast height (DBH - 1.3 m from the ground) higher than or equal to 5 cm and their respective heights were measured.

From the data collected in the field, the following variables were obtained by plot: minimum DBH (DBHmin), mean DBH (DBHmed), maximum DBH (DBHmax), minimum total Height (Hmin), mean total Height (Hmed), maximum total Height (Hmax), Mean Square Diameter (Dq) and Number of Trees (N).

Above Ground Biomass (AGB) was estimated by tree individual according to the equation proposed by ^{Chave et al. (2014}), using DBH and total tree height and average basic wood density of 0.620 g cm^{-}³. The estimate was performed using the software R (^{R Core Team 2018}), using the BIOMASS package (^{Réjou-Méchain et al. 2017}). The estimate of AGB was converted to carbon stock in Mg ha^{-1}, according to ^{Thomas and Martin (2012}), a methodology consisting in multiplying AGB by 0.471, which according to the authors corresponds to carbon concentration in tropical forest angiosperms tissues.

*Independent variables selection*

First, a selection of independent variables was performed by the stepwise method, based on the Akaike Information Criterion (AIC). Thus, the combination of variables that make up the model with the lowest AIC is considered the best. subsequently, the selected variables were used as inputs to model carbon stock by plot through machine learning techniques.

*Machine learning algorithms*

For carbon stock modeling, support vector machines (SVM) and artificial neural networks (ANNs) were used. The SVM construction was based on the supervised machine learning process described by ^{Haykin (2001}) and ^{Steinwart and Christmann (2008}), where there is a set of paired-order n samples (**X**, **Y**), where **X** is a matrix of explanatory variables of the sample and Y is the expected value vector of the sample. Based on this information, taking as input a vector of variables, a chosen function predicts the expected value of the sample. A linear function is given by the form f (**X**) = <**W**, **X**> + b, where **W** is a weight vector.

The type IV error function, also known as *eps-regression*, was used, being the RBF (Radial Basis Function) type Kernel function. Kernel functions offer an alternative solution by designing data in a space with large characteristics to increase the computational power of machine learning, making it possible to represent nonlinear phenomena (^{Cristianini and Shawe-Taylor 2000}). This procedure was performed in software R, version 3.4.1, through the *e1071* package (^{Meyer et al. 2019}).

Trained ANNs were Multilayer Perceptron (MLP), composed of an input layer, an intermediate layer and an output layer. The algorithm used was the resilient backpropagation, in which the learning rate was automatically defined by the *neuralnet* package, with values ranging from 0.01 to 1.12.

The choice of the number of neurons in the hidden layer was made using k-fold. This methodology randomly subdivides the database into k subgroups (^{Ali and Pazzani 1996}, ^{Cigizoglu and Kisi 2006}). The value of k was 10 subgroups, with a proportion of 90 % for training and 10 % for testing (^{Diamantopolou 2005}), applying cross-validation. Different numbers of neurons, ranging from 1 to 20, were tested.

Logistics (or sigmoidal) was the activation function used, with a range from 0 to 1, which implies limiting the amplitude of the outputs and inputs. Consequently, data were normalized, which consists of transforming the values of each variable to values between 0 and 1. Linear standardization was obtained through equation [1] (^{Soares et al. 2011}) and considers the minimum and maximum value of each variable in the transformation of values, maintaining the original distribution of data (^{Valença 2010}).

where: *x'*: normalized value, x: original value, *xmin*: minimum value of the variable, *xmax*: maximum value of the variable, *a*: lower limit of the standardization range, *b*: upper limit of the standardization range.

The stopping criterion for the ANN training process was the maximum number of 100.000 cycles, or the mean square error of less than 1 %, and training was terminated when one of these criteria was met. At the end of training, the best ANN was selected based on the lowest mean square error.

A nonlinear equation for tree biomass prediction was extracted from the artificial neural network. Consequently, we generated a system of equations with coefficients resulting from weights generated by the neurons of ANN. This system was used to predict the carbon stock of the plots that comprised the validation database.

Data were divided into two groups: 70 % for ANN training and SVM construction and 30 % for validation of both techniques. Among the data intended for ANN training, 70 % were used in the training phase and 30 % in the test phase.

*SVM and ANN performance evaluation*

SVM and ANN performances were evaluated in the training and validation phases. Accordingly, the techniques were used to predict the carbon stock in the data set intended for validation, *i.e.* data that had not been used in training. The prediction quality analysis was performed using Mean Relative Error (MRE %) (equation 2), Bias (equation 3), Root Mean Square Error (RMSE %) (equation 4) (^{Leite and Andrade 2002}, ^{Siipilehto 2000}), graphs of residuals distribution, graphs of estimated versus observed carbon stocks and the correlation coefficients between estimated and observed values.

where: *Yi* represents the observed value, Y i the estimated value, n the number of observations and 𝑌 the average of the observed values.

RESULTS

The forest with predominance of Anadenanthera sp. contained an average of tree carbon stock (AGC) of 94.25 Mg ha^{-1}. Descriptive statistics of the variables used are presented in table 1.

Where: C = carbon stock (Mg.ha^{-1}), DBHmin = minimum breast height diameter (DBH) of the sample plot (cm), DBHmax = maximum DBH of the sample plot (cm), DBHmed = mean DBH of the sample plot (cm), Hmin = minimum total height of the sample plot (m), Hmax = maximum total height of the sample plot (m), Hmed = mean total height of the sample plot (m), N = number of trees in the sample plot, Dq = mean square diameter of the sample plot (cm).

Stepwise method indicated through the Akaike information criterion that the variables minimum DBH, maximum DBH, average DBH, average H and N are those that have a stronger influence on carbon stock variability, and were, therefore, selected for modeling. It should be noted that DBHmed and Hmed represent the main tree growth trends in each plot; DBHmin and DBHmax, the lower and upper limits of diameter growth, respectively; and N represents the density of individuals in each plot. Figure 2 presents the scatter plots between the variables and their respective confidence intervals and the distribution of each variable.

The configurations obtained with the construction of the support vector machine, which resulted in a machine with 44 support vectors, are presented in table 2.

Regarding the approach by artificial neural networks, figure 3 illustrates the architecture and weights obtained from the selected ANN that presented the lowest error among the others evaluated, composed by six neurons in the hidden layer.

From the 5-6-1 architecture artificial neural network, an equation system was extracted to predict carbon stock per plot, with coefficients derived from weights generated by neural network neurons. This system was used to predict the carbon stock of plots that make up the database for validation.

Model [ 5] expresses the relationship between the hidden layer and the response variable, where β0 is the bias and the other coefficients are weights related to each neuron. Model [ 6] represents the activation function used in each hidden layer neuron, derived from the logistic model. Finally, model [ 7] is the result of the relationship between input variables and the respective hidden layer neurons, and a model is generated for each neuron.

where: 𝛽 0 : bias, 𝛽 𝑛 : model coefficient associated with neuron n, 𝛽 𝑛 : model coefficient between input variable k and neuron n, 𝑧 𝑛 : n-th hidden layer neuron response, 𝑤 𝑖 : sum of products between weights and inputs.

The coefficients of the system of equations extracted from the selected artificial neural network are presented in table 3.

The support vector machine and the model extracted from the artificial neural network were applied to the data set intended for validation. The analyzed techniques presented satisfactory performance in the modeling of carbon stock by plot, due to homogeneous distribution and low dispersion of residues and with predicted values close to those observed, as it can be observed in figures 4 and 5, respectively.

According to the graphs (figures 4 and 5), there is a slight superiority of ANN, which presented more concentrated residues around zero and predicted values closer to the real ones. SVM presented higher residual dispersion and two relative error values above 80 %, while ANN presented a maximum relative error of 30 %.

The performance evaluation criteria of the analyzed techniques are presented in table 4. The qualities of the estimates made by SVM and by the model extracted from ANN were evaluated on both data used in the training and validation dataset.

The estimates of the analyzed techniques were strongly correlated with the observed values, showing correlation above 0.99 in the training phase and 0.96 in the validation phase. Error magnitudes, represented by RMSE, were below 10 % in the training phase and 15 % in the validation. The lower the RMSE, the higher the accuracy of the estimates, and the optimal situation when it is zero (^{Mehtätalo et al. 2006}). Bias indicated slight underestimation trends in the training phase; whereas in validation, there was a tendency of SVM to overestimate carbon stock values (5.6479 %) and of ANN to underestimate them (-4.6217 %). Mean SVM relative error increased from 6.7689 % in the training phase to 13.5185 % in the validation phase; whereas in ANN, this increase went from 4.8035 % to 6.9375 %.

DISCUSSION

The average of tree carbon stock contained in the forest with predominance of Anadenanthera sp. (94,25 Mg ha^{-1}) is above the average carbon stock for this vegetation type in south-central Minas Gerais (55 Mg ha^{-1}, ^{Scolforo et al. 2015}) and compatible with other local studies in semideciduous seasonal forests in the study region. For instance, ^{Ribeiro et al. (2009}), quantifying the biomass and tree carbon stock in a mature semideciduous forest in Viçosa, Minas Gerais, Brazil, found 166.67 Mg ha^{-1} and 83.34 Mg ha^{-1} for biomass and carbon stocks, respectively. Likewise, ^{Figueiredo et al. (2015}) that evaluated the dynamics of the tree carbon stock in a semideciduous forest in Minas Gerais, Brazil, found an average carbon stock of 71.81 Mg ha^{-1}.

Although the average carbon stock was relatively high, its variation among plots (CV%) was also significant. This is mainly due to the fact that the carbon stock variable reflects the variations in other dendrometric variables. Moreover, in natural multiage forests, characteristics such as high ecological complexity, spatial variations in species structure and composition, the presence of clearings and other factors can lead to great variability in biomass/carbon stock values (^{Soriano-Luna et al. 2018}) among sample plots.

There is a strong relationship between mean DBH and mean height. Overall, there is a tendency of carbon stock to be positively correlated to the other variables, except for the minimum DBH, in which there was no direct or indirect proportional relationship, which is evidenced by the elliptic shape of the dispersion between these variables. In general, the relationship between carbon and independent variables tended towards linearity. There are, however, some nonlinear behaviors between variables, such as between carbon stock and maximum DBH. In this context, it is worth to emphasize that artificial intelligence/machine learning has the ability to implicitly detect any nonlinear relationship between the response variable and explanatory variables. In addition to the fact that there are no assumptions regarding input data, such as independence and normality, and its high capacity for learning and generalization.

The main purpose of using these techniques, as in classical regression, is their application to data that were not used in their training. ANN presented better performance in the two analyzed phases, training and validation, which indicates its superiority for modeling the carbon stock per plot in the analyzed data set, when compared to SVM.

ANN was able, with the available variables, to explain almost all the variation in carbon stock in the study area. Several studies have demonstrated the superiority of artificial neural networks when compared to other techniques (^{Özçelik et al. 2013}, ^{Vendruscolo et al. 2015}). This superiority can be explained by the ability of neural networks to detect implicit information and nonlinear relationships between the response variable and explanatory variables provided as examples and to generalize the assimilated knowledge to an unknown data set.

It is worth noting that, although underperforming ANN, SVM was very efficient in estimating carbon stock in the study area. SVM has the advantage over ANN that no evaluation is required after its construction, as it occurs in ANN to select the best network. This is due to the quadratic optimization that occurred during SVM training (^{Cristianini and Shawe-Taylor 2000}) which allows the same result to be obtained for each system configuration whenever applied to the same data set. However, ANNs have more elements to be manipulated; besides, the initialization of neuron parameters occurs at random (^{Haykin 2001}). Thus, each trained network will have slight differences in estimates, even if the same architecture is maintained. These differences highlight the practicality of SVM in relation to ANN as SVM excludes the subjectivity of the operator in choosing the best network to be applied to the database.

Therefore, both approaches were able to explain much of the carbon stock variation in the study area. This is mainly due to the robustness of ANN and SVM. The small part of the carbon stock variation, not explained by the variables in question, is due to the various factors not considered in the present study that are known to affect the variability of carbon stock in forests, such as species diversity, forest size, degree anthropization, among many others and their interactions (^{McNicol et al. 2018})

The results presented in this study provide insights for future assessments of the use of machine learning techniques to obtain carbon stock estimates. Some examples of potential applications are estimates of carbon stock in other forest compartments, such as soil and tree roots, carbon stock estimates through the association between machine learning techniques and remote sensing variables, among others.

CONCLUSIONS

The present study brings important contributions in the modeling of carbon stocks in forests through the use of machine learning. The machine learning techniques performed satisfactorily, and a new model extracted from an artificial neural network for carbon stock prediction in Mg ha^{-1} is efficient, with potential application in other secondary semideciduous seasonal forests.

Carbon stock modeling using a model extracted from the artificial neural network training presented better performance than that presented by the support vector machine, using the same variables of the analyzed data set.

Determining, modeling and supplying forest carbon stock data are strong scientific and social demands currently, as tree carbon storage is considered a key environmental service in mitigating current climate change by sequestering CO_{2} from forest atmosphere.