## Servicios Personalizados

## Revista

## Articulo

## Indicadores

- Citado por SciELO
- Accesos

## Links relacionados

- Citado por Google
- Similares en SciELO
- Similares en Google

## Compartir

## Chilean journal of agricultural research

##
*versión On-line* ISSN 0718-5839

### Chilean J. Agric. Res. v.70 n.3 Chillán sep. 2010

#### http://dx.doi.org/10.4067/S0718-58392010000300010

Chilean Journal of Agricultural Research 70(3):428-435 (July-September 2010)

**RESEARCH**

**Comparison of Regression and Neural Networks Models to Estimate Solar Radiation**

** Comparación de Regresión y Modelos de Redes Neuronales para Estimar la Radiación Solar**

** Mónica Bocco ^{1*}, Enrique Willington^{1}, and Mónica Arias^{2 }**

^{1}Universidad Nacional de Córdoba, Facultad de Ciencias Agropecuarias, CC 509-5000 Córdoba, Argentina. *Corresponding author (mbocco@gmail.com).

^{2}Universidad Nacional de Salta, Facultad de Ciencias Naturales, 4400 Salta, Argentina.

**ABSTRACT **The incident solar radiation on soil is an important variable used in agricultural applications; it is also relevant in hydrology, meteorology and soil physics, among others. To estimate this variable, empirical models have been developed using several parameters and, recently, prognostic and prediction models based on artificial intelligence techniques such as neural networks. The aim of this work was to develop linear models and neural networks, multilayer perceptron, to estimate daily global solar radiation and compare their efficiency in its application to a region of the Province of Salta, Argentina. Relative sunshine duration, maximum and minimum temperature, rainfall, binary rainfall and extraterrestrial solar radiation data for the period 1996-2002, were used. All data were supplied by Experimental Station Salta, Instituto Nacional de Tecnología Agropecuaria (INTA), Argentina. For both, neural networks models and linear regressions, three alternative combinations of meteorological parameters were considered. Good results with both prediction methods were obtained, with root mean square error (RMSE) values between 1.99 and 1.66 MJ m

^{-2}d

^{-1}for linear regressions and neural networks, and coefficients of correlation (r

^{2}) between 0.88 and 0.92, respectively. Even though neural networks and linear regression models can be used to predict the daily global solar radiation appropriately, neural networks produced better estimates.

**Key words:**modeling, prediction, linear regression, multilayer perceptron.

** ****RESUMEN **La radiación solar incidente en el suelo es una variable importante usada en aplicaciones agronómicas, además es relevante en hidrología, meteorología y física del suelo, entre otros. Para estimarla se han desarrollado modelos empíricos que utilizan distintos parámetros meteorológicos y, recientemente, modelos de pronóstico y predicción basados en técnicas de inteligencia artificial tales como redes neuronales. El objetivo de este trabajo fue desarrollar modelos lineales y de redes neuronales, del tipo perceptrón multicapa, para estimar la radiación solar global diaria y comparar la eficiencia de los mismos en su aplicación para una región de la Provincia de Salta, Argentina. Se utilizaron datos de heliofanía relativa, temperaturas máxima y mínima, precipitación, precipitación binaria y radiación solar astronómica provistos por la Estación Experimental Salta, Instituto Nacional de Tecnología Agropecuaria (INTA), Argentina, correspondientes al período 1996-2002. Tanto para los modelos de redes neuronales como para las regresiones lineales se consideraron tres alternativas de combinaciones de los parámetros meteorológicos, obteniéndose buenos resultados con ambas metodologías de predicción, con valores de la raíz del error cuadrático medio variando desde 1.99 a 1.66 MJ m

^{-2}d

^{-1}y coeficientes de correlación de 0.88 a 0.92. Se concluye que ambos, los modelos de redes neuronales y las regresiones lineales, pueden ser usados para predecir en forma adecuada la radiación solar global diaria; si bien las redes neuronales produjeron mejores resultados.

**Palabras clave:**modelos, predicción, regresiones lineales, perceptrón multicapa.

**INTRODUCTION **The incident solar radiation on soil is an important variable used in agricultural applications, particularly for modeling crop development, values of soil moisture, potential evapotranspiration and photosynthesis, among others. It is also important in hydrology, meteorology and soil physics. Moreover, the availability of these data, or their estimation based on specific sites or mechanistic prediction models, improves the usefulness of the climate data sets (Ball

*et al*., 2004).

In places where radiation measurements are sparse, theoretical estimations of the available solar energy can be used to predict these measurements from standard weather parameters that are extensively measured (air temperature, relative humidity, effective sunshine duration and cloudiness) (Santamouris

*et al*., 1999).

While solar energy data are recognized as very important, their acquisition is not easy. The measurement of solar radiation requires the use of expensive equipment, and in developing countries there are not always adequate facilities to mount viable monitoring programs. Therefore, there have been several attempts to estimate solar radiation through the use of meteorological and physical parameters (Togrul and Togrul, 2002). The lack of observed atmospheric variables prevents the use of many analytical procedures and forces us to use their estimation by different methods in order to be able to use these procedures (De la Casa

*et al*., 2003).

Several empirical models have been developed to calculate global solar radiation using various parameters, the relative sunshine duration is the most commonly used. In 1924 Angstrom used a linear relationship between global radiation and sunshine duration; a modified version of this correlation, proposed by Prescott in 1940, has been the most convenient and widely used for estimating global solar radiation; this is known as the Angstrom-Prescott equation (Podestá

*et al*., 2004).

Almorox

*et al*. (2008) adequately estimated global solar radiation for 11 meteorological stations in Venezuela from sunlight data using a linear regression to calculate the Angstrom-Prescott equation. Falayi

*et al*. (2008) developed, for Nigeria, multilinear regression equations to predict the relationship between global solar radiations with different weather parameters.

A simple and fast physically based method for the estimation of global solar radiation using meteorological satellite data for was presented for Wloczyk and Richter (2006). For irrigated agricultural area was analyzed the distribution of net radiation flux density using a method that combine satellite remote sensing with field observation (Folhes

*et al.*, 2006).

Most of the studies used to predict solar radiation were based on time series methods (including regression analysis), which are limited in the number of parameters that can accurately handle. In particular, Fortin

*et al*. (2008) developed a long-utilized linear approach based on latitude and daily temperature range. In addition, estimations of daily radiation resulting from an Angstrom-Prescott relationship have adequate accuracy at a monthly scale, but are not accurate at a daily scale (Ceballos

*et al*., 2005).

Recently, prognostic and prediction models based on artificial intelligence techniques such as neural networks (NN) have been developed. These models can handle a large number of data, predict the contribution of these in the outcome and provide prompt and adequate predictions (Al-Alawi and Al-Hinai, 1998). Using neural networks, Bocco

*et al*. (2006) made models to estimate solar radiation at Córdoba (Argentina), Mohandes

*et al*. (1998) for Saudi Arabia and Fortin

*et al*. (2008) for Canada.

Within this methodology, the multilayer perceptron is probably the most commonly used algorithm with the architecture of neural networks because of its capacity to tolerate information that is incomplete, inaccurate or contaminated with noise (Mas and Flores, 2008). The multilayer perceptron consists of a non-parametric statistical model of nonlinear regression which generally uses a single hidden layer to completely divide the spectral space by means of hyperplanes along which the level of activation of hidden units is constant (Foody, 2000).

The aim of this work was to develop linear models and neural networks to estimate daily global solar radiation from commonly observed meteorological data and compare the overall efficiency of these models and networks in an application to a region of the Province of Salta (Argentina).

**MATERIALS AND METHODS**

**Site of application**

Daily values of meteorological variables, including radiation, for the 1996-2002 period, were provided by the Experimental Station Salta (24º54' S, 65º29' W, 1234 m a.s.l.), Instituto Nacional de Tecnología Agropecuaria (INTA), Argentina. All data were collected with an automatic weather station, Vantage Pro2 Stations (Davis Instruments, Hayward, California, USA). The agro meteorological station is part of the National Climate Network and takes weather observations three times a day. As regards to the type and location of the instruments with which samples are taken, both are standardized by the World Weather Organization and the National Meteorological Service (Estación Experimental Agropecuaria Salta, INTA). The astronomical solar radiation corresponding to this site was calculated using the Solar-Calc software by USDA-ARS (2007).

**Linear models**

The statistical analysis began studying the observed radiation distribution. There were 2550 observations, with an average value equal to 14.19 MJ m

^{-2}d

^{-1}, minimum and maximum values equal to 1.20 and 28.80 MJ m

^{-2}d

^{-1}, respectively. A coefficient of asymmetry with value -0.07 and percentiles 25 and 75 for this variable were equal to 10.29 and 18.50 MJ m

^{-2}d

^{-1}, respectively.

For the variable under study, there were extreme values (minimum and maximum) and concentration of the values close to the average.

The meteorological parameters used to estimate solar radiation from the statistical correlations were maximum and minimum temperatures (ºC), rainfall (mm), binary rainfall (a binary function with value 1 for occurrence and 0 for days with no precipitation), relative sunshine duration (%) and astronomical solar radiation (MJ m

^{-2}d

^{-1}). For all variables we performed a correlation analysis to obtain a measure of the magnitude and direction of the association of each pair of variables. Since in Argentina, many stations only have instruments to measure and record some meteorological variables; it is a very useful tool to consider rainfall a binary variable.

For linear regression analysis three possible parameter combinations were considered: Regression R1: daily values of maximum temperature (Tmax), minimum temperature (Tmin), rainfall (R), relative sunshine duration (RSD) and astronomical solar radiation (ASR); Regression R2: daily values of maximum temperature, minimum temperature, binary rainfall (BinR), relative sunshine duration and astronomical solar radiation; and Regression R3: daily values of maximum temperature, minimum temperature, rainfall and astronomical solar radiation.

**Neural networks models**

A neural network (NN) model, multilayer perceptron, was used to estimate the incident solar radiation. This procedure is a mathematical model that performs a computational simulation of the behaviour of neurons in the human brain by replicating, on a small scale, the brains patterns in order to produce results from the events perceived, i.e. it is a model based on learning a set of training data. The main characteristic of NN is their capacity for learning by example. This means that by using a NN there is no need to program how the output is obtained, given certain input; the NN will learn the existing input-output relationship by means of a learning algorithm. This learning will materialize in the networks topology and in the value of its connections. Once the NN has learnt to carry out the desired function, input values for which the output is unknown can be entered, and the NN will calculate the output.

The NN are composed of a number of interconnected processing elements which are joined by weighted connections. The training algorithm adjusts the connection weights through an iterative procedure in which the error is minimized (Ashish

*et al*., 2004). The amount of training data required for successful classification increases exponentially with increased dimensionality of the input data (Dixon and Candade, 2008). The Multilayer Perceptron (Figure 1) is a fully connected multilayer feed forward supervised learning network with symmetric hyperbolic tangent activation functions, trained by the back-propagation algorithm to minimize a quadratic error.

**Figure 1. Schematic map of a multilayer perceptron artificial neural network. **

The general steps that describe the training algorithm of the proposed networks are described, according to Bocco *et al.* (2006), as follows: Initialize the weights in the net with random values (step 1); read an input pattern X_{p}: (*x*_{p1}, *x*_{p2}, ..., *x*_{pN}) and the desired output *d* (step 2); generate the output calculated by the *net* for the presented input. To do so, the values of the answers in each layer are obtained, until the output layer is reached (step 3). The *net* for the hidden neurons (H_{j}) coming from the input (*net*) is calculated as follows:

[1]

where the sub-index *p* corresponds to the *p*-th training vector, *j* to the *j*-th hidden neuron, *w*_{ji} is the weight of the connection between I_{i} and H_{j} and the term *q _{j}* corresponds to a term of the minimum threshold to be achieved by the neuron for its activation. Based on these inputs the outputs of the hidden neurons are calculated, using an activation function

*f*:

[2]

To obtain the results of neuron in the output layer, the same is done:

[3]

** ** [4]

Once all neurons have an activation value for a given input pattern, the algorithm continues calculating the error for each neuron, except for those in the input layer (step 4). For the neuron in the output layer, if the answer is *y*, such error (d) can be expressed as:

** **[5]

If the neuron *j* is not an output one, then the derivative of the error cannot be directly calculated. The error in the hidden layers depends on all the terms of the error in the output layer. For this reason they are called backpropagation.

In order to update the weights the recursive algorithm, starts with the output neuron and working backwards until the input layer is reached (step 5). This process is repeated an *n* number of times, so that an acceptably low square error (*E*_{p}) for all the learned patterns, can be reached (step 6).

** ** [6]

In our work, the size of the input layer that receives the information from various parameters that affect the radiation is the number of variables (described in detail later) and the output layer has one neuron which indicates the predicted total daily solar radiation (Est Rad). The number of neurons in the hidden layer and the number of hidden layers are selected during the training process.

The final process of this technique is the validation that always requires a separate data set for which we know the phenomenon behaviour and on which errors are estimated. The aim was to verify the efficiency of the designed NN.

The training process used 50% of the data, these were taken at random from the 1996-2002 period and 2000 iterations were performed. The validation process was carried out with the other half of the data, all corresponding to Salta. To evaluate the models performance, the statistical parameters root mean squared error (RMSE) and correlation coefficient (r^{2}), were considered.

The use of neural networks (NN) has opened new perspectives since they do not hypothesize on data distribution (Walthall *et al.*, 2004). It was verified that the observed solar radiation does not correspond to a normal distribution using a Shapiro-Wilks test (modified) and a Kolmogorov test for goodness-of-fit (p < 0.05 in both tests).

NN models considered three alternatives of the variables for the input layer, equivalent to the parameter combinations in linear regressions: (1) Model M1: daily values of Tmax, Tmin, rainfall, RSD and ASR (the same parameters as R1); (2) Model M2: daily values of Tmax, Tmin, binary rainfall, RSD and ASR (analogous to R2); and (3) Model M3: daily values of Tmax, Tmin, rainfall and ASR (parameters of R3).

The three models were constructed with an input layer of four (M3) or five (M1 and M2) neurons and one hidden layer of 10 neurons.

With the aim of comparing the linear regression results with the developed NN model results, correlations and regressions were performed using only half the data, exactly the same data set used in the training phase of the NN.

**RESULTS AND DISCUSSION **The results of the validation process of all models allowed the calculation of different statistic values between observed and estimated values of solar radiation (Table 1).

**Table 1. Root mean squared error (RMSE) and correlation coefficient (r ^{2}) in the validation phase for the different models. **

When studying the results of the correlation analysis, there was a correlation between the observed radiation (Ob Rad) and the variables (p-values < 0.05 in row 1 Table 2), except for rainfall. The coefficients between the observed radiation and the dependent variables analyzed (column 1), point to a positive correlation of various sizes for Tmax, RSD and ASR, and negative value for binary rainfall (Table 2).

**Table 2. Correlation matrix between solar radiation and other meteorological variables. **

In the scatter plots of Figure 2 the relationship between the observed radiation and the other variables is shown. The graphics displayed a high correlation with the RSD, a low correlation with Tmin, no correlation with rainfall and a high correlation with the Tmax, although this correlation does not correspond to a linear model.

**Figure 2. Scatter plots of relationships between solar radiation and other meteorological variables. **

The regression coefficients for linear models R1, R2 and R3 were:

_{ }

These regression correlation values ranged between r^{2} = 0.88 and 0.64. For Nigeria, Falayi *et al*. (2008) found, when they related the ratio between observed and astronomical radiation, correlation values ranging between r^{2} = 0.56 and r^{2} = 0.97 according to the construction of regression equations with respect to a single variable or a combination between RSD, ratio of minimum and maximum temperature, relative humidity and monthly average daily temperature.

The NN models presented high correlation values, in particular M1 obtained a RMSE = 1.66 MJ m^{-2} d^{-1} and M2 got a RMSE = 1.68 MJ m^{-2} d^{-1} and RMSE = 2.97 MJ m^{-2} d^{-1} for M3; consequently, M1 and M2 can be used to make good estimates of daily global solar radiation values from registered data of daily maximum and minimum temperature, rainfall (or binary rainfall for M2), RSD and theoretical ASR.

In order to analyze the performance of the models that present better adjustment (M1, M2, R1 and R2), scatter plots considering observed and estimated solar radiation values were done (Figure 3).

**Figure 3. Scatter plots between observed and estimated solar radiation in Salta (Argentina) for M1, M2, R1 and R2 models. **

The obtained results are considered a good estimate of global solar radiation because they are consistent with those published by other authors. Podestá *et al*. (2004) for the Humid Pampa, applying the Angstrom-Prescott equation, reported RMSE between 1.54 and 1.90 MJ m^{-2} d^{-1} using relative sunshine duration, and when they used temperature and precipitation RMSE increased to 3.23 and 4.28 MJ m^{-2} d^{-1}.

For Canada, Fortin *et al*. (2008) developed a multiple-layer perceptron network (same kind to the NN used in this work) to estimate surface incoming solar radiation on an horizontal surface, obtaining, with different input variables, RMSE between 3.83 and 5.45 MJ m^{-2}. Using NN for Cordoba (Argentina), Bocco *et al*. (2006), with thermal amplitude, rainfall, cloudiness and RSD data, obtained RMSE similar to those estimated for Salta, with values ranging between 3.15 and 3.88 MJ m^{-2} d^{-1}.

In the analyzed models, the temporal evolution of the calculated radiation values shows a seasonal pattern that fits correctly to annual variation of solar radiation. As an example, Figure 4 shows the temporal evolution of the values estimated by model M1.

**Figure 4. Evolution of estimated solar radiation for the M1 model for Salta (Argentina) 1996-1998.**

The results show that M1 and M2, for NN models, and R1 and R2 for linear regression, have the lowest RMSE values. These have also the highest correlation coefficients (Table 1). Comparing the statistics of M1, M2 and M3 with R1, R2 and R3, respectively, smaller values of error and higher correlation coefficients for neural networks were observed. Surely this could be due to the nonlinearity of the relationship of solar radiation with any of the considered variables, and as noted by Verger *et al*. (2008) NN allow good estimates for complex and nonlinear problems.

The RMSE and r^{2} values of both M3 model and R3 show the importance of RSD data to estimate the total daily solar radiation, because although the coefficient r^{2} = 0.73 for M3 indicates a proper estimation without this information, better results are obtained when this parameter is included in the models, a similar behaviour is observed on linear regressions.

**CONCLUSIONS **Solar radiation can be adequately estimated by linear models and neural networks, from values of meteorological variables of routine use; even NN produced better estimates.

Neural networks are an efficient methodology to estimate daily solar radiation, using a reduced number of meteorological parameters; they allowed, principally, reproduce the solar radiation evolution patterns for Salta (Argentina).

Even though linear regressions produce good estimates of daily global solar radiation, predictions are strongly correlated to the data set used.

Relative sunshine duration is a key variable involved in the calculation procedures of several agricultural and environmental indices. Estimation of surface incoming solar radiation is, therefore essential, and models such as the one proposed might prove extremely useful.

**ACKNOWLEDGEMENT**

The production of this manuscript was supported meteorologist in part by Secretaría de Ciencia y Tecnología de a Universidad Nacional de Córdoba (SECyT-UNC). The authors are grateful to Meteorólogo Ignacio Nieva (Estación Experimental Agropecuaria - INTA Cerrillos, Salta, Argentina) for providing data used in this paper.

**LITERATURE CITED**

Al-Alawi, S., and H. Al-Hinai. 1998. An ANN-Based approach for predicting global radiation in locations with no direct measurement instrumentation. Renewable Energy 14:199-204.

Almorox, J., M. Benito, and C. Hontoria. 2008. Estimation of global solar radiation in Venezuela. Interciencia 33(4):280-283.

Ashish, D., G. Hoogenboom, and R.W. McClendon. 2004. Land-use classification of grey-scale aerial images using probabilistic neural networks. Transaction ASAE 47:1813-1819.

Ball, R., L. Purcell, and S. Carey. 2004. Evaluation of solar radiation prediction models in North America. Agronomy Journal 96:391-397.

Bocco, M., G. Ovando, and S. Sayago. 2006. Development and evaluation of neural network models to estimate daily solar radiation at Córdoba, Argentina. Pesquisa Agropecuaria Brasileira 41(2):179-184.

Ceballos, C., M. Botino, y R. Righini. 2005. Radiación solar en Argentina estimada por satélite: algunas características espaciales y temporales.

*In*IX Congreso Argentino de Meteorología, Buenos Aires, Argentina. Octubre 2005. Ed. Universidad Nacional de Buenos Aires, Buenos Aires, Argentina.

De La Casa, A., G. Ovando, y A. Rodríguez. 2003. Estimación de la radiación solar global en la provincia de Córdoba, Argentina, y su empleo en un modelo de rendimiento potencial de papa. RIA 32:45-61.

Dixon, B., and N. Candade. 2008. Multispectral land-use classification using neural networks and support vector machines: one or the other, or both? International Journal of Remote Sensing 29:1185-1206.

Falayi, E.O., J.O. Adepitan, and A.B. Rabiu. 2008. Empirical models for the correlation of global solar radiation with meteorological data for Iseyin, Nigeria. International Journal of Physical Sciences 3(9):210-216.

Folhes, M., C.D. Rennó, J.V. Soares, and B.B. Silva. 2006. Comparing net surface radiation estimation from remote sensing to field data.

*In*Anais - III Simpósio Regional de Geoprocessamento e Sensoriamento Remoto, Aracaju. 25-27 Octubre 2006. Embrapa Tabuleiros Costeiros, Aracaju, Sergipe, Brasil.

Foody, G.M. 2000. Mapping land cover from remotely sensed data with a softened feed forward neural network classification. Journal of Intelligent & Robotic Systems 29:433-449.

Fortin J., F. Anctil, L. Parent, and M. Bolinder. 2008. Comparison of empirical daily surface incoming solar radiation models. Agricultural and Forest Meteorology 148:1332-1340.

INTA, Estación Experimental Agropecuaria Salta. 2009. Available at http://www.inta.gov.ar/prorenoa/met/estac_conv_cerrilosinta_resumen.htm (accessed October 2009).

Mas, J.F., and J.J. Flores. 2008. The application of artificial neural networks to the analysis of remotely sensed data. International Journal of Remote Sensing 29:617-663.

Mohandes, M., S. Rehman, and T. Halawani. 1998. Estimation of global solar radiation using artificial neural networks. Renewable Energy 14:179-184.

Podestá, G., L. Núñez, C. Villanueva, and M. Skanski. 2004. Estimating daily solar radiation in the Argentine Pampas. Agricultural and Forest Meteorology 123:41-53.

Santamouris, M., G. Mihalakakou, B. Psiloglou, G. Eftaxias, and D. Asimakopoulos. 1999. Modeling the global solar radiation on the Earth surface using atmospheric deterministic and intelligent data-driven techniques. Journal of Climate 12:3105-3116.

Togrul, I., and H. Togrul. 2002. Global solar radiation over Turkey: Comparison of predicted and measured data. Renewable Energy 25(1):55-67.

USDA-ARS. 2007. Software Solar-Calc. United States Department of Agriculture, Agricultural Research Service, Washington DC, USA. Available at http://www.ars.usda.gov/services/software/download.htm?softwareid=62 (accessed June 2009).

Verger, A., F. Baret, and M. Weiss. 2008. Performances of neural networks for deriving LAI estimates from existing CYCLOPES and MODIS products. Remote Sensing of Environment 112:2789-2803.

Walthall, C., W. Dulaney, M. Anderson, J. Norman, H. Fang, and S. Liang. 2004. A comparison of empirical and neural network approaches for estimating corn and soybean leaf area index from Landsat ETM+ imagery. Remote Sensing of Environment 92:465-474.

Wloczyk, C., and R. Richter. 2006. Estimation of incident solar radiation on the ground from multispectral satellite sensor imagery. International Journal of Remote Sensing 27:1253-1259.

Received: 28 July 2009.

Accepted: 02 November 2009.