INTRODUCTION

In forest surveys, some dendrometric variables are measured in the field, highlighting the diameter measured at 1.30 m above the ground (DBH) and total height. DBH is considered as the main variable, since it is a direct measure and easy to obtain. Total height is another variable of great importance, where its measurement is taken indirectly and presents itself as a difficulty in the surveys due to factors such as the difficulty in visualizing the top of trees, time required to complete measurements, among others. These factors, in addition to interfering with the accuracy of measurements, significantly affect the cost of forest inventories.

In 1957, Ker and Smith proposed the use of hypsometric relationships, in which, by measuring the diameters (DBH) and the heights of some trees in the plot, a height-diameter curve (hypsometric relationship) is obtained and the height of the others can be estimated. Since then, several models for height prediction have been proposed and can be found in literature (^{Curtis 1967}, ^{Inoue and Yoshida 2004}, ^{Campos and Leite 2009}).

It is known that the quality of hypsometric relationships is influenced by several factors besides DBH, such as forest sites, age, genetic material, silvicultural tracts, among others. The inclusion of these factors in hypsometric models can lead to a gain in the quality of estimates and in biological realism. However, the modeling and quantification of the influences of these characteristics on the variable to be estimated makes this inclusion difficult, since the relations present non-linear characteristics or qualitative (categorical) values (^{Binoti 2012}).

With the advancement of evolutionary computing and the spread of artificial intelligence, artificial neural networks (ANN) have been widely used as an alternative to hypsometric models, for the modeling and prognosis of forest yield. ^{Dantas et al. (2020}) assessed the quality of the volumetric estimation of *Eucalyptus* spp. trees using machine learning and observed a marked decrease in residual standard error, from 0.0142 m³ (7.9830 %) in the nonlinear fixed-effects regression model to 0.0024 m³ (0.6060 %) in ANN. ^{Freitas et al. (2020}) evaluated ANN to estimate eucalyptus productivity as a function of environmental variables and what was obtained was ANN with correlation between the estimated and observed mean annual increment of eucalyptus stands at six years of age higher than 85 % and root mean square error less than 15 %.

ANN is an algorithm based on simple processing units (artificial neurons), mimicking the neurons found in the human brain, which calculates specific functions ^{Braga et al. (2007}). These units are distributed in layers and connected to each other by weights that store the experimental knowledge and weight the inputs of each unit. With that, the acquired knowledge becomes available for use.

The most striking features in ANN are the ability to learn and generalize information. In other words, ANN are able, through a learned example, to generalize the knowledge assimilated to a set of unknown data. Another interesting feature is the ability to extract non-explicit features from a set of information that is provided as examples (^{Gorgens et al. 2015}).

One aspect that must be considered, with the adoption of ANN as a modeling tool in forest management, is the possibility of reducing the number of measurements necessary for training the networks, without losing the quality of the estimates. This would result in a decrease in data collection time and cost of forest inventories.

One of the most important pieces of information to determine the potential of a forest in a given region is the variable “volume,” the accurate quantification of which is essential in forest management planning. The individual volume serves as a starting point for assessing the wood content in a forest stand and provides support for decisions related to silvicultural practices and timber harvesting and transport. Thus, it is essential that the volume of trees be correctly determined to provide an accurate representation of the sampled population.

The search for methodologies that provide exact estimates and, at the same time, make it possible to reduce the cost and time of measurements is constant, requiring studies that provide subsidies for the manager in the processing of forest inventory data. In this sense, the objective of this work is to propose and evaluate the performance of different models based on ANN in estimating the total height of eucalyptus trees and estimating the total volume in eucalyptus stands. It is proposed as hypothesis that the use of artificial neural networks allows the reduction of the number of heights measurements in forest inventories, without losing the accuracy of the estimates.

METHODS

*Data base*

The study area consists of 28 management units with four different *Eucalyptus* spp. clones. (MG01, MG02, MG03 and MG04), in the municipality of Minas Novas, Minas Gerais, Brazil, totaling 900 hectares. The climate of the region is characterized as tropical dry climate, Aw type, according to the Köppen climate classification, with average annual temperature of 22.2 °C, with dry winters, and rainy summers with high temperatures. Average annual total precipitation is 961 mm (^{Alvares et al. 2013}).

The data for this study came from forest inventories in plantations aged 4 years, planted at 3 x 3 m spacing. In the forest inventory, 100 rectangular sample units with an area of 870 m² were measured. A total of 9,378 individuals were measured. In each plot, the diameter, in centimeters, at 1.30 m above the ground (DBH) of all trees was measured; the total height (Ht), in meters, of 20 trees; and the total height, in meters, of the five dominant and codominant trees (Hd).

*Data processing*

For processing, three different forms of dominant heights were considered for each plot: using the highest tree in the plot (Hd1), the average of the two tallest trees (Hd2) and the average of the three tallest trees (Hd3); the basal area of the plot (Gparc), in m²; and the dominant DBH (DBHd) of each plot, resulting from the average DBH of the five dominant trees.

Four groups were created, different from each other by the number of trees in each plot used in the training of ANN: (a) G1, consisting of one tree as a training sample, one as a test sample and one as a validation sample; (b) G3, consisting of three trees as a training sample, one as a test sample and one as a validation sample; (c) G5, composed of five trees as a training sample, one as a test sample and one as a validation sample; (d) GT, composed of all trees (except the trees of the test and validation samples) as a training sample, one as a test sample and one as a validation sample. None of the groups contained the trees that were used to calculate the dominant height of each plot. In addition, the trees in the test, validation and training samples were different so that, during training, the network used the specified number of trees in each sample.

*Training of artificial neural networks*

To obtain ANN to estimate the total height of the trees, the ANN were trained. This procedure consisted of adjusting their weights, using a learning algorithm that extracts characteristics from the data and aims at generating a network that performs the task of interest (^{Binoti et al. 2014}). The training was performed in R, version 3.4.1, using the neuralnet package (^{Günther and Fritsch 2010}).

Trained ANN were Multilayer Perceptron (MLP) networks, consisting of an input layer, an intermediate layer, and an output layer. The algorithm used was resilient backpropagation, where the learning rate was set automatically by the package neuralnet, with values ranging from 0.01 to 1.12. The number of neurons in the intermediate layer was chosen using the k-fold. This methodology randomly subdivides the database into k subgroups (Wong *et al.* 2017). The k value was 10 subgroups, with 90 % for training and 10 % for testing (Diamantopolou 2010), applying cross validation. Different numbers of neurons, ranging from 1 to 20, were tested.

The activation function used was logistic (or sigmoid), with an interval from 0 to 1, which limits the amplitude of outputs and inputs. Therefore, data were normalized, which consisted of transforming the values of each variable into values ranging from 0 to 1, using equation [1] (Soares et al., 2011). This equation considers the minimum and maximum value of each variable in the value transformation, maintaining the original data distribution (Valença 2010).

Where:

x’: normalized value.

x: original value.

x_{min}: minimum value of the variable.

x_{max}: maximum value of the variable.

a: lower limit of the normalization range.

b: upper limit of the normalization range.

The stopping criterion of the ANN training process was a maximum number of 100,000 cycles, or a mean squared error less than 1 %, stopping the training when meeting one of the criteria. At the end of the training, the best ANN were selected, based on the smallest mean squared error.

In each group (G1, G3, G5 and GT) 12 combinations among variables were obtained. Thus, ANN training sessions were: (a) three sessions without the dominant height as one of the input variables, (b) three sessions with the dominant height Hd1, (c) three sessions with the dominant height Hd2 and (d) three sessions with the dominant height Hd3. Each set of training sessions was subdivided as follows: (i) one session with the categorical variable Clone and the continuous variable DBH, (ii) one session with the categorical variable Clone and the continuous variables DBH, DBHd and Gparc, (iii) one session without categorical variables and with continuous variables DBH, DBHd and Gparc. In each session, 50 ANN were trained, and the network with the best performance was retained based on the training, testing and validation values of determination coefficient and sum of squares of errors. At the end of this process, 48 networks were obtained, one for each combination between the four groups and the 12 training sessions.

To assess the quality of ANN training, the heights of all individuals used in the network adjustments were estimated and the Bias % and the Root Mean Square Error (RMSE %) were calculated (^{Siipilehto 2000}, ^{Leite and Andrade 2002}).

Bias and RMSE are used as a parameter in choosing the networks that showed the best performance in the training phase, however, this does not guarantee that they will be able to make a good generalization in an unknown database. To assess the performance of the ANN in generalization, the five best ANN were selected, based on the values of Bias and RMSE, and these were used to estimate the heights of the trees that had no height measured in the field. For the remainder, the heights measured in the inventory were maintained.

*Hypsometric and volumetric models*

The hypsometric model cited by ^{Campos and Leite (2009}) [2] was adopted as a reference in estimating heights, due to its good performance, which can be attributed to the use of dominant height as one of the variables independent of the model (^{Leite and Andrade 2003}). For this, the model was adjusted by Clone using all trees with measured height. The dominant height used was obtained by the average of the five trees with the highest height in each plot, since this is the standard procedure already adopted by the forest company. After adjusting the model, it was applied to the inventory data to estimate the height of trees that had no height measured in the field.

After estimating heights, using the hypsometric model and the five best ANN, linear equations of ^{Schumacher and Hall (1933}) [3] were adjusted by Clone, using taper data obtained by accurately estimating the cubic volume of 159 trees, in which Ht, DBH and diameters were measured at the base of the trees (at 0.1 m high) and at heights of 0.5 m, 1 m, 1.5 m and 2 m and, from this section, every 2 m. Individual volumes were obtained using the Smalian formula. The adjusted equations were applied to the inventory data and the volumes of each stem were estimated, considering the six heights obtained for each one (one height estimated by the hypsometric model and five heights estimated by the five best ANN). Finally, the volume per hectare for each plot was estimated and the average volume per hectare for each plot was estimated.

Where:

Ht = total height, in meters.

DBH = diameter, in centimeters, at 1.30 m in height of the tree.

Hd = dominant height, in meters, from the average height of the 5 highest trees in the plot.

β_{0}, β_{1} and β_{2} = parameters of the model.

*e* = random error.

vol = volume in m³.

*Evaluation of estimates*

The quality of the height estimates was evaluated in the calculation of the total volume, per sample plot and per management unit, the Average Relative Error (ERM) between estimated volumes (Vest), from the heights estimated by the five ANN, and the observed volume (Vobs), derived from the heights estimated by the hypsometric model; distribution graphs of estimated and observed volumes; and the correlation coefficients between estimated and observed volumes.

RESULTS

The networks trained with the dominant height as an input variable showed better performance and less square sum of errors (figure 1) in the phases of training, testing and validation by the software used, in all four groups.

With ANN, the heights of trees with known height were estimated. As a result, each tree has an observed height and 48 estimated heights. The values of Bias and RMSE calculated to evaluate the performance of the networks are presented in tables 1 and 2, respectively. Bias values close to zero indicate less error tendencies in the estimates. Negative values indicate overestimates and positive values indicate underestimates. RMSE values indicate the average magnitude of the error.

Bias and RMSE indicated different network performances according to the input variables used. When analyzing the different dominant heights considered in the training of the networks, it appears that the highest Bias values were found in networks that did not have a dominant height as an-input variable, with a tendency to overestimate the height values. The networks trained without the Hd variable showed Bias between -11.82 and 3.51 %, while in the networks where Hd was used, there was a smaller variation, from -5.61 to 2.43 %. The different ways of calculating the dominant height studied showed Bias values close to each other. With values between -4.79 and 2.30 % for Hd1; -5.61 and 2.43 % for Hd2; and -5.53 and 1.87 % for Hd3.

Due to larger bias, networks without Hd also presented higher magnitudes of error, which can be verified by the higher values of RMSE. Networks without Hd had an average RMSE of 5.62 %, while in networks with Hd1 the average was 3.73 %, in networks with Hd2, 3.79 % and 3.70 % in networks with Hd3. Regarding the maximum values of RMSE, in networks without Hd, the value of 12.55 % was verified, and in networks with Hd, the maximum value was 7.54 %. Among the different types of Hd, as well as in Bias, maximum values of RMSE were found close to each other: 6.82, 7.54 and 7.27 % for Hd1, Hd2 and Hd3 respectively. Indicating that the use of the height of only one dominant tree can generate networks with good estimation capacity.

It shows that, in addition to Hd, the simultaneous absence of the categorical variable Clone in the training process of networks negatively influenced the quality of the estimates. Among the ANN trained without Clone and without Hd as input variables, Bias values varied between -11.82 and 3.51 % and the maximum RMSE value was 12.55 %. There was a tendency to overestimate heights, especially in Clone MG01, where the average height of trees is lower. As networks did not have the information of maximum heights or information that differentiated Clones (Clone), the same pattern verified in the other Clones was applied, in which trees are bigger. In networks where the Clone variable was used and Hd was not used in training, Bias values were between -4.24 and 2.55 % and the maximum RMSE verified was 9.19 %. The networks trained with Clone and Hd presented Bias between -3.74 and 2.43 % and maximum RMSE of 5.26 %.

With the use of continuous variables (DBH, DBHd, Gparc and Hd), it was possible to obtain even lower values of Bias and RMSE. Among the networks that used categorical variables and did not use continuous variables, except DBH, there were Bias values between -3.74 and 2.43 % and maximum RMSE of 6.42 %; while in the networks trained with continuous and categorical variables, Bias values were between -1.42 and 1.84 % and maximum RMSE of 4.49 %.

The ANN with all the trees used as a training sample resulted in estimates with lower values of Bias and RMSE, however, there is no significant difference between the values observed for these networks and those whose training samples are made up of smaller numbers of trees. For networks with a tree as a training sample, Bias values varied between -11.82 and 3.40 % and the maximum RMSE was 12.55 %. For networks with three trees in the training sample, Bias values varied between -11.55 and 2.44 % and the maximum RMSE was 12.50 %. Nets with five trees in the training sample showed Bias values between -11.00 and 2.93 % and maximum RMSE of 11.62 %. For networks with all trees in the training sample, Bias between -10.08 and 3.51 % and maximum RMSE of 11.22 %.

Considering the performance of networks (lowest values of Bias and RMSE), the best five were selected, whose minimum and maximum RMSE and Bias values are shown in table 3.

With the adjustment of the hypsometric model used as a reference in this study, four equations were obtained, one for each Clone. The coefficients of the equations and their determination coefficients (R²) are shown in table 4. In all adjusted equations, the parameters associated with the coefficients (DBH and Hd) were significant by the t test (*P* < 0.05).

All equations showed lower values than those shown by ANN, when compared by the determination coefficient (R²). The lowest value found in training the networks was 0.7661 (figure 1), while the highest value found in adjusted hypsometric models was 0.7655 (table 4). This result corroborates with ^{Haykin (2001}), who showed that ANN may have a higher estimation capacity than that of the regression models. The parameters and coefficients of determination (R²) were obtained by adjusting the volumetric model of ^{Schumacher and Hall (1933}), by Clone. In all adjusted equations, the parameters associated with the coefficients (DBH and Ht) were significant by the t test (*P* < 0.05).

The volumes per hectare, for each sample plot, estimated by the adjusted models of Schumacher and Hall using, in addition to the DBH, the heights estimated by the five best ANN, indicated differences in the estimates. Considering the hypsometric model cited by ^{Campos and Leite (2009}), adjusted by Clone, as a reference for the height estimation and comparing the estimates of this with the height estimates by ANN, it can be observed, in general, a trend of Mean Relative Error (MRE %) less than 10 % (figure 2). In the ANN where Hd was used as an input variable in training, there is less dispersion of percentage errors around zero, indicating higher precision of the estimates. In the ANN where Hd was not used, despite having a low variation in the values of Bias and RMSE, there is a high dispersion of percentage errors (figure 2).

It is noted that the MRE trend was closer to zero in the networks trained with five trees in the training sample. However, this superiority is not so significant as to compromise the use of the network trained with three trees in the training sample, since the MRE values in this network were, in general, below 5 %.

Considering the estimated volumes per hectare, for each plot, it is confirmed that Hd contributed significantly to obtain more accurate estimates (table 5).

In the 5-S-2 network, nine MRE values above 10 % were verified, in a total of 28 estimates, and the maximum MRE verified was 26 %. In 3-H2-2, 5-H2-2 and 5-H3-3 networks, only a value above 10 % was verified, for the same number of estimates. The highest MRE values are 12 % for the 3-H2-2 network, 13 % for the 5-H2-2 network, and 11 % for the 5-H3-3 network. The 5-H1-2 network did not present a MRE value above 10 %, with 7 % being the maximum value. The lowest average of the MRE modules was also verified for this network (table 6).

DISCUSSION

Results show that the use of variables, both categorical and continuous, that manage to represent the characteristics of the plots, especially the Clone variable, is important in the training of ANN to obtain estimates with better accuracy, since these variables provide information about the specificities of each Clone, field or project, reducing, for example, the generalization of characteristics observed in a given Clone to others with different behaviors. It is worth mentioning that in the data used, considering the categorical variables, only Clone information was available, the introduction of additional information, such as soil type, terrain preparation, precipitation, spatial arrangement, radiation, among others, can contribute to increase the quality of the estimates.

The use of the Hd variable contributed to the improvement of the estimates and the use of the height of the largest tree in the plot resulted in ANN with performances similar to those presented by the networks trained with Hd coming from the average height of more than one dominant tree in the plot.

The reduction in the number of trees used as a training sample did not significantly affect the performance of the networks. The use, for example, of five trees as a training sample can already provide a considerable gain of time and cost reduction in the forest inventory and the difference between the maximum RMSE of networks trained with all trees and networks trained with five trees was only 0.40 %.

In the case of the forest company, from which the data used were obtained, the number of Ht measured per plot, which is 25 (20 normal trees and five dominant trees), could be reduced to eight (seven normal trees, where five would be used in the training sample, one in the test and one in the validation; and one dominant tree). Enabling a reduction in the measurement time and, consequently, in the cost of the forest inventory, increasing the efficiency of the measurement team.

^{Binoti et al. (2013}), in a study on the effect of reducing Ht measurements on the precision obtained by ANN, evaluated the estimates obtained by reducing the number of plots with measured Ht and also concluded that it is possible to reduce the number of measurements without loss of accuracy. Still according to the authors, it is possible to reduce the cost of the forest inventory through the application of ANN in the estimation of the Ht of the trees.

According to ^{Leite and Andrade (2003}), the dominant height allows representing different productive capacities of the places where the plots are located. This is important since the relationship between total height and DBH of trees can differ among plots located in areas with lower, medium or higher productivity.

The networks with the highest precision were those with training samples composed of five trees per plot (however, the number of trees can be reduced to three without major losses in accuracy); use of the dominant height variable, regardless of how many trees are used in its calculation (1, 2 or 3); and categorical and continuous variables that differentiate the different extracts, such as the Clone, Gparc and DBHd variables.

More specifically, the best performance was presented by the 5-H1-2 network, in whose training five trees were considered as a training sample, one as a test sample and one as a validation sample; dominant height from the height of the highest tree in the plot, categorical variable Clone and continuous variables DBH, DBHd and Gparc (figure 3).

From the artificial neural network an equation system was extracted to predict the individual tree height of *Eucalyptus* spp., with coefficients resulting from the weights generated by the ANN. Model (4) expresses the relationship between the hidden layer and the response variable, where *β*
_{
0
} is the bias, and the other coefficients are the weights related to each neuron. Model (5) represents the activation function used in each neuron of the hidden layer, derived from the logistic model. Finally, the model (5) is the result of the relationship between the input variables and the respective hidden layer neurons, generating a model for each neuron.

Where β 0 : bias, 𝛽 𝑛 : coefficient of the model associated with neuron n, 𝛽 𝑘.𝑛 : coefficient of the model between input variable k and neuron n, 𝑧 𝑛 : response of the n-th neuron of the hidden layer, 𝑤 𝑖 : sum of the products between the weights and the inputs.

The coefficients of the system of equations extracted from the artificial neural network are presented in table 7.

It can be inferred, therefore, that the ANN performed satisfactorily in estimating the total height of the trees studied, for later obtaining the individual volumes and per unit area. Therefore, this tool is applicable to the processes of estimating the total height of eucalyptus trees, allowing the reduction of the number of measurements required per plot without significant interference in the accuracy of the estimates obtained.

Another important aspect to be considered, due to the ease provided to the modeler, is that, unlike regression models, adjustments by extract are not necessary, since a single ANN is representative for all extracts (^{Haykin 2001}).

Diamatopoulou (2005) reports that the quality of the estimates obtained through the ANN is due to their ability to model several variables and overcome certain problems found in forest data, such as non-linear relationships, non-Gaussian distributions, outliers and data failures.

CONCLUSIONS

The present study considerably improves the modeling of the height and log volume of Eucalyptus spp. trees, using machine learning. The technique performed satisfactorily, and the models based on Artificial neural networks proposed in this study to estimate the total height of eucalyptus trees are efficient and their application is recommended due to the expressive reduction of the number of tree heights to be measured in the field.

The model that presents the best performance, according to the data used, consists of five trees as a training sample, one as a test sample and one as a validation sample; dominant height from the height of the highest tree in the plot; categorical variable Clone and continuous variables: diameter at 1.30 m in height from the base of the tree, dominant diameter and basal area of the plot.