SciELO - Scientific Electronic Library Online

Home Pagelista alfabética de revistas  

Servicios Personalizados




Links relacionados


Journal of the Chilean Chemical Society

versión On-line ISSN 0717-9707

J. Chil. Chem. Soc. v.50 n.2 Concepción jun. 2005 


J. Chil. Chem. Soc., 50, N 2 (2005), págs.: 483-487





Department of Inorganic and Analytical Chemistry. Faculty of Chemical Sciences and Pharmaceutical Sciences. University of Chile. Casilla 233. Santiago de Chile. E-mail


Using a model of 27 saturated hydrocarbons, the logarithm retention times relative to n-hexane were correlated with 11 physico-chemical properties.

For all correlations studied, where log trr was used as a dependent variable, the correlation based on log trr vs. critical pressure presented statistical parameters indicating that this relationship is the optimal one within the set being studied.

The multi-variable regression studies gave significantly smaller Fisher indices than that obtained in the previously indicated relationship.

Principal Component Analysis (PCA) applied to all treated variables indicated that only one linear combination exists with a statistically significant value accounting for almost all of the data variability.


The proposal of a topological index1, 2) and its comparison with an established one requires a validation process or referencing methodology. The references determine the validity of the proposed new index or a pre-existing one. It is very common in the literature to find the use of physico-chemical properties as references for topological indices of a group of substances 3, 4) The process of evaluating the design of a topological index contains two stages: a.- obtaining mathematical regressions for a set of substances between one or more physico-chemical properties used as references, and the proposed index, and b.- the analysis of statistical parameters of the corresponding regressions. Their values indicate the strengths and weaknesses of the proposed index. Nevertheless, the intrinsic property of a topological index is determined ultimately by the difference between the dependent variable calculated by means of the mathematical regression function proposed, and the experimental dependent variable. A small absolute difference between both, indicates greater information contained in the chemical graph described by the topological index 5). The same occurs in the difference between chromatographic retention parameters calculated by means of regression equations, and experimental ones, in this case using physico-chemical parameters instead of the topological index. In QSRR methodology (quantitative relationships between molecular structure and different chromatographic retention coefficients) it is common to evaluate the chromatographic retention parameters obtained by gaseous system (GLC) using the physico-chemical references already mentioned. Boiling point is generally used as a reference 6-8); other physico-chemical properties are sometimes used for the same purpose9).

For identification analysis using GLC ­ MS, it is important to have a mathematical correlation model with one or more independent physico-chemical variable references, to match the homologue substances used in the GLC analysis and to obtain calculated retention time as another identification method permitting the resolution of troublesome cases, for example, in the analysis of saturated hydrocarbons in which unspecified fragments originate from homologous hydrocarbon compounds. In such cases overlapping peaks exist which obstruct the process of identification analysis10)

There is nothing in the literature to indicate that boiling point as a reference always produces optimal statistical regression parameters for all organic homologue series, with respect to correlation with other physico-chemical references.

The aim of this work is to find, for a model set of saturated hydrocarbons, the optimal physico-chemical reference chosen from a set of physico-chemical properties, such that an optimal regression is obtained for a model GLC retention parameter. To achieve that, the correlation matrix of the logarithm of the relative retention time (log trr) of the model and its physico-chemical properties was obtained, thus allowing the choice of best reference according to the size of the correlation indices. Two and three references with high correlation indices (r) were used as independent variables in multivariate log trr relations. These multivariate regressions were compared with the linear regression where the independent variable presents an optimal r value with the log trr variable.

Using Principal Component Analysis (PCA) 11,12) the number of linear combinations between the orthogonal PCn variables, and all the original standardized independent variables could be determined. The number and kind of PCn variables were determined by the PCA method. In general terms, this analysis allows the determination of the weight of each variable on PCn relationships, a reduction in the number of variables and the determination of variables with related properties.


The model of 27 saturated hydrocarbons13) was studied with respect to the correlation indices (r ) for the regression between the dependent variable corresponding to the logarithm of the gaseous phase (GLC) retention times, relative to n-hexane13) (log trr) retention time and 11 physico-chemical parameters14) characteristic of the saturated hydrocarbons being studied. The physico-chemical properties used were obtained from the Chemoffice software14) database. The regressions obtained from the variables under study are shown in Table 1, the headings of which [A0; A1A11] signify the following:

A0 : (Retention time relative to n-hexane)
A1 : (Logarithm of the relative retention time)
A2 : (Partition coefficient, [ n-octanol / water])
A3: (Molar refractive index, [cm3 / mol])
A4: (Normal boiling point, [ pressure = 1 atmn, K])
A5: (Freezing point, [ pressure = 1 atm, K])
A6: (Critical temperature,[ K])
A7: (Critical pressure, [ bar])
A8: (Critical volume, [cm3 / mol])
A9: (Formation heat, [KJ / mol, pressure = 1 atm, 298.15 K])
A10; (Gibbs energy, [KJ / mol, pressure = 1 atm, 298.15 K])
A11: (Ideal gas capacity heat, [J / mol, pressure = 1 atm, 298.15 K])

Table 1. Retentions times relative to n-hexane and the physicochemical properties of hydrocarbons

The elements of Table 1 form the [aij] matrix. This matrix is the based on the PCA process, in which A0 elements are not considered. In Table 1 (excluding column A0) it is possible to obtain the correlation (r) matrix [R] of all combinations of the variables [ A1..... A11]. The [R] matrix is presented in Table 2, where the first column (A1), contains the r values of log trr vs. each of the reference elements ( physico-chemical parameters) (A1.....A11). All the regressions considered correspond to the function: y = m*x + n. Each of these functions were evaluated based on the following statistical parameters: the index of correlation ( r), the standard deviation (s.d.) of the regression and the Fisher index (F). The result in the first column of the [R] matrix shown in Table 2 is the principal motive for this study. For this particular subgroup, the complete set of regression analyses were made, seeTable 3. According to the data of this table, log trr vs. critical pressure (A7) is the best equation of all, equation (1)

log trr = -0.0092 (±0.0028)*A7 + 2.8385 (±0.0896)
0000R = -0.9884
00s.d. = 0.0894
0000F = 1060.4150

Table 2. Correlation index (r) matrix from data of Table 1

This regression presents r, F statistical indices that are the highest of all those calculated when doing the linear regression between log trr and each of the physico-chemical properties of the model; it also presents a smaller s.d. statistical index , see Table 3. Comparing this with the log trr relation versus boiling point, ( a reference frequently used for topological index validation) equation 2, it is clear from both of the regression statistical parameters that the first one is better. This is corroborated by comparing the r, s.d. F statistical values of the correlation log trr calculated using equation 1 versus experimental log trr values, (column A1 Table 1), see equation 3, with the regression log trr calculated by equation 2 versus experimental log trr , see equation 4

Table 3. Statistical parameters of the linear regressions between hydrocarbons log trr and its physiochemicals properties

The equations (2), (3), (4) referred to above, are as follows:

log trr = 0.0157(± 6.34 10-4 )*A4 ­ 5.2355(± 0.2126)
= 0.9797
0s.d = 0.1170
00F = 608.5000

log trr ( calculate by equation 1 ) = 0.9769(± 0.01702)* log trr (experimental) ­ 5.5690 10-4 (± 0.0170)
000r = 0.9885
0s.d = 0.0880
00F = 1060.3640

log trr (calculated equation 2) = 1.0000 (± 0.0405)*log trr (experimental) + 3.60 10-4 (± 0.0225)
000r = 0.9800
0s.d. = 0.1170
00F = 608.2000


The optimal statistical indices (r, s.d., F) that belong to the following multivariate regression log trr ( A6, A7, A8), log trr ( A7, A8), log trr ( A6, A8), log trr ( A6, A7), are indicated in Table 4. It can be seen that there are no multivariate combinations that present significantly better statistical regressions than the model used in equation 1. This is a casuistic advantage, since in the multivariate regressions the independent multiplicative factors are difficult to interpret and unreliable without obtaining orthogonalized factors15) .

Table 4. Statistical regression parameters for the multivariate functions

The objective of Principal Component Analysis (PCA) is to reduce the number of variables by obtaining the maximum possible linear combinations from the [aij] matrix , see Table 1, without loss of variability of the original data, and to group similar variables based on their characteristics. This is in agreement with the PCn numbers obtained by the PCA method. In general a maximum of two or three n values is obtained, corresponding to two or three linear combinations.

PCA was applied to the original data [aij] according to the following procedure:

a.- The data of Table 1 from A1 to A11 forming matrix [aij] is standardized into a matrix [Z]27x11 such that the average of each column equals 0 and its standard deviation equals 1. For this purpose equation 5 was used16)

Zij =(aij ­ m) / s (5)

where m is the median and s is the standard deviation.

With this procedure, spurious information introduced due to the differences in column magnitudes existing in the original matrix [aij], is eliminated. See Table 1.

000b.- From matrix [Z]27x11, the correlation matrix [R]11x11 is obtained, thus allowing the degree of correlation to be seen for all possible doublet combination variables. This matrix is symmetrical with respect to the principal diagonal elements (bii) all equal to 1, see Table 2

000c.- From matrix [R]11x11 the eigenvalues matrix [E]11x11 is obtained where the eigenvalues correspond to main diagonal terms (eii); the other terms (eij) equal 0

000d.- From matrix [R]11x11 the eigenvector matrix [V]11x11 is obtained. Each column of eigenvectors of this matrix [V]11x11, corresponds to one eigenvalue from [E]11x11

Each element of matrix [V]11x11 is a multiplying factor (w) of the standardized variable Z for each PCn linear combination. In this particular case only a single linear combination PC1 is obtained because only one eigenvalue ( l1) is greater than one. For this reason, from the [V]11x11 matrix, the wn1 factor of the Zn1 standarized variable is significant and only one equation is possible which accounts for all the original data variability. See equation 6.

PC1 = w11 * Z11 + w21 * Z21 + wn1 *Zn1 (6)

The loading factors of PC1 are obtained from equation 7

I = w* l0.5 (7)

The loadings factor I represents the correlation index of each Z standardized variable (equation 6) with respect to the PC1 dependent variable. The values of w and I are presented in Table 5.

Table 5. Factor w and I calculated on the basis of the greater eigenvalue obtained from [R] matrix

The PC1 principal component corresponds to a l1 = 10.4571 and accounts for 95.058 % of the original data variability. Two other principal components, not statistically important , PC2, and PC3 with l2 = 0.3700 and l3 =0.1463 respectively, account for 3.363 % and 1.330 % of the original data variability. The other minor eigenvalues are omitted because they are not statistically significant. The accumulated percentage of variability up to and including PC3 is 99.752 %

All calculations and procedures were made using Statgraphic17) and Mathlad18) software supported by the appropriate bibliography19). Both methods produced identical mathematical results.


000I.- For a model of 27 saturated hydrocarbons, the linear regression between the logarithm of relative retention time to n-hexane and critical pressure is optimal with respect to regressions obtained with other physico-chemical parameters of the model.

000II.- All the variables used in this study display a close relationship with one another. This is demonstrated by PCA analysis which indicated that only one linear combination is possible. This fact is corroborated by the size of the elements of the correlation matrix.

000III.- For the model proposed, the absolute difference between the experimental log tr and the log trr calculated on the basis of critical pressure is less than the same calculation done using log trr calculated using the boiling point . This implies that the former has greater predictive power than the latter.

000IV.- This study shows that critical pressure is a valid reference for the evaluation of topological indices applied to saturated hydrocarbons.

Note: means a function of.


Y greatly appreciate to Victoria Hare Cornwell the Spanish to English translation.



1. Z. Mihalic, N. Triajstic, J. Chem. Educ. 69, 701-712 (1992).

2. M. Randic, J. Zupan, J. Chem. Inf. Comput. Sci. 41, 550-560 (2001).

3. T. F. Woloszyn, P. C. Yours, Anal. Chem. 65, 582-587 (1993).

4. E. Estrada, L. Rodríguez, J. Chem. Inf. Comput. Sci. 39, 1037- 1041 (1999).

5. O. Exner, I. Kramosil, I. Vadjde, , J. Chem. Inf. Comput. Sci. 33, 407-411 (1993).

6. T. P. Schultz, , J. Chem. Inf. Comput. Sci. 38, 853-857 (1998).

7. B. Ren, , J. Chem. Inf. Comput. Sci. 42, 585-868 (2002).

8. B. Ren, , J. Chem. Inf. Comput. Sci. 43, 161-169 (2003).

9. E. Estrada, , J. Chem. Inf. Comput. Sci. 35, 31-39 (1995).

10. M. Pompe, J. M. Davis, D. P. Samuel, , J. Chem. Inf. Comput. Sci. 44, 399-409 (2004). 11. R. C. Graham, "Data Analysis for the Chemical Sciences. A Guide to Statistical Techniques. V. C. Publisher. Inc. (1993) pag. 329- 346.

12. R. A. Cazar, J. Chem. Educ. 80, 1026-1029 (2003).

13. G. Zweig, J. Sherma "Handbook of Chromatography" CRC Press (1976), pag 50.

14. Software Cambridge Soft Cs. Chemdraw Pro. Cambridge Soft Corporation 875 Massachusetts Avenue. Cambridge, MA 021923 USA. Version 4.5 october 28,1997.

15. M. Randic, , J. Chem. Inf. Comput. Sci. 37, 672-687 (1997).

16. N. Gilbert "Estadística" Edit. Interamericana (1980), pag 107.

17. Statgraphic Plus Windows 4.0 Profesional. Version Copyright 1994-1999 by Statistical Graphic Corp.

18. Mathlab Version 6.00.88 Realease 12 september 22 , 2000 Copy right 1984-2000. The Math Work, Inc.

D. L. Massart, B. G. M. Vandeginste, S. N. Deming, I. Machote, L. Kaufman. "Chemometric: a textbook. Elsevier Science Publishing Company. INC. (1990) pag 339-370.


Creative Commons License Todo el contenido de esta revista, excepto dónde está identificado, está bajo una Licencia Creative Commons