## Journal of the Chilean Chemical Society

##
*On-line version* ISSN 0717-9707

### J. Chil. Chem. Soc. vol.51 no.1 Concepción Mar. 2006

#### http://dx.doi.org/10.4067/S0717-97072006000100001

J. Chil. Chem. Soc., 51, Nº 1 (2006)

**NEW IDEA FOR THE TOPOLOGICAL INDEX EVALUATION AND TREATISE MULTIPLE REGRESSION WITH THREE INDEPENDENT VARIABLES. SATURATED HYDROCARBONS USED LIKE A MODEL**

**E. CORNWELL**

Departamento de Química Inorgánica y Analítica, Facultad de Ciencias Químicas y Farmacéuticas, Universidad de Chile, Casilla 233, Santiago, Chile

*E-mail: ecornwel@ciq.uchile.cl*

**ABSTRACT**

In QSRR discipline an easy novel to used parameter was designed (Vc) for evaluated classical topological index (W, ^{1}c, Z, MTI) and two new generation ones (Xu, ^{1}c^{h}). Regression between Vc and ^{1}c^{h} presented a correlation index (r) of 0,9992, a surprising high value in comparison with that founds commonly in QSPR/QSAR discipline. Through Vc parameter, an idea to treatise multiple three independent variable regression is present. Model of 35 saturated hydrocarbons were used.

**INTRODUCTION**

A mayor part of the current research in mathematical chemistry, chemical graph theory and quantitative structure-activity-property relationship studies involves topological indices. Topological indices (TIs) are numerical graph invariants that quantitatively characterize molecular structure.

A graph G = (V, E) is and ordered pair of two set V and E, the former representing a nonempty set and the latter representing unordered pairs of elements of the set V. When V represent the atoms of the molecule and element of E symbolize covalent bonds between pairs of atoms, then G becomes a molecular graph. Such graph depicts the topological of the chemical species. A graph is characterized using graph invariants, an invariant may be a polynomial, a sequence of number, or a single number as the case used in the present article. A single number numerical graph invariant that characterize the molecular structure is called a topological index.

Application of graph theory to chemical and to structure-property-activity (QSPR/QSAR) relationships has led to the emergence of several critical graph-theoretical indices.

First application of graph-theoretical invariants in studies of structure-properties relationship (QSPR) was proposed by Weiner^{1) }[Weiner index, (W)]. However, it was after Randic^{2)} proposed a topological index for characterization of molecular branching [(Randic index, (^{1}c)] that dramatic expansion of studies in the area started. The two former topological indices indicated, plus Hosoya^{3)} (Z), Schultz^{4) }(MTI), Ren^{5)} (X_{u}), and C.Yang and C Zhong^{6)} (^{1}c^{h}) indices are evaluated using a new idea based on the three physicochemical properties of the molecules, the molar refraction index (MR) the critical pressure (Pcr) and the critical volume (Vcr). In previous report^{7)} y proved that these physicochemical properties correlated so well with logarithmic relative retention time relative to n-hexane (log (t_{rr})) in GLC analysis by means of a linear relation ( y = m*x + n). The novel relation proposed by the author (Vc) is a general idea and was proved in to 35 saturated hydrocarbons^{8)} taken like a model, each one of these hydrocarbons are characteristize by a ordered set x_{i}, y_{i}, z_{i }, x_{i} is the molar refraction index^{7)} (MR), y_{i} the critical pressure^{7)} (Pcr) and z_{i} is a critical volume^{7)} (Vcr) of an hydrocarbon i . Vc is the Euclidian distance of a particular set x_{i}, y_{i} and z_{i} to one hydrocarbon i belonging to the 34 hydrocarbon set respect to ethane with ordered set x_{o}, y_{o }and z_{o }The election of other referent hydrocarbon, produce results not satisfactory as well as methane, perhaps, the cause of that, is a molecular structure differences of any one of the 34 hydrocarbons relative to ethane.

All correlations treatise in this issue (y = mx + n) was referred to the linear regression between Vc and the topological indices cited or log t_{rr} versus all other variables in study. Other physical-chemical properties^{ }cited in my last published issue^{7)} were used in three elements set to defined Vc and not good results were obtained as the three proposed one (MR, Pcr, Vcr)

Through a linear regression of Vc with all proposed topological index, permit us to order its in accordance with the magnitude correlation index (r), order that is the same when we correlated the relative GLC retention time respect to hexane expressed like a logarithm of these magnitude (log t_{rr} )^{8)} with the same set of indices, this characteristic indicated that the idea involved in Vc (definition of Vc with using appropriate physical-chemistry properties) is interesting for evaluate topological index, The second categories is automatic established in function of the first ordering.

All regression function log t_{rr} vs.( x_{i}, y_{i}, z_{i}); (x_{i}, y_{i} ); (x_{i}, z_{i}); (y_{i}, z_{i}) presented similar R^{2} values and similar R^{2} value respect to regression function log t_{rr} vs Vc

Is necessary to point out that the interpretation of the parameters of multivariable regression is valid if these parameters are in orthogonal form^{9)} and the number of independent variable used must be in accordance with the number of cases treatise, if not, R^{2 }value is false by excess^{10)}, these limitation are not present in all types of regression used in the present study (y = mx + n) where Vc, log t_{rr }and the topological indices were used for the_{ }35 saturated hydrocarbons model.

The results obtained in this issue indicated that is possible used the idea of Vc parameter to the evaluation of topological indices applied to other organic homologue series. And to reduced multiple regression till to three independent variable to linear regression of the type y = mx + n

**PROCEDURE**

The Vc parameter is obtained through the distance (D) between the set ordered (x_{i} , y_{i, }z_{1}) of a particular saturated hydrocarbon and the pair ordered (x_{o}, y_{o, }z_{o }) corresponding to ethane, particularly (11,48, 50.299, 147.5) distance D is obtained by Euclidian formulae, equation 1

D = [ (x_{i }_ x_{o})^{2} +(y_{i} _ y_{o})^{2} + (z_{i} _ z_{o})^{2} ]^{0.5 } = Vc (1)

This equation (1) was applied to 35 saturated hydrocarbons with the values expressed in columns 5-7 presented in Table 1 ( For i = 0 using equation (1) the ethane distance is equal 0). Results for Vc values are expressed in Table 2, column 9. In column 10 are presented the calculated Vc from equation (2) and at column 11, the absolute error percent of Vc respect to Vc _{calculated} by equation (3)

The regression of Vc parameter respect to ^{1}c^{h} are defined by equation 2

Vc = -94.732(± 2.416)+149.532( ± 1.010)*^{1}c^{h} (2)

R^{2} = 99.84%

r = 0.9992

s.d = 3.3994

F = 21896.8

**Table 2.** Logarithm of retention time and diverses topological and parameter index ( Vc )

The meanings of the column titles are defined in the next,* indicated reference substance** **

Where r is the correlation coefficient, s.d is standard error of estimate and F is Fisher-ratio. Analysis of variance of the above correlation is in Table 3

**Table 3.**Analysis of variance of equation number 2 correlation

In this correlation, since the p-value in the ANOVA Table 3 is less than 0.01 there is a statistically significant relationship between both variables at the 99% confidence level. The R-Squared statistic indicates that the model explains 99.84% of the variability in Vc, r indicate a strong relationship between the variables, s.d error shows the standard deviation of the residual to be 3.399

The relation Vc calculated by means of equation (2) versus Vc values is defined by equation Nº 3

Vc _{calculated } = 0.39046 (±1.798) + 0.9985 (±0.0067)*Vc (3)

R^{2} = 0.9985

R = 0.9993

s.d = 3.39

F = 21896.12

The analysis of this correlation is made by the same way that equation (2) but without ANOVA analysis, not necessary, because little percentage of errors existents between calculated Vc respect to Vc

**Tabla 4. **Correlation matriz of topological indices and parametrix index

In Table 4 the matrix of all possible combinations of regression were present, each a_{ij} matrix term represent the correlation index (r) where the linear relation Vc f (^{1}c^{h}) is the biggest one (0.9992) In function of the r values matrix (terms a_{8,2 }to a_{8,7}) evaluated by Vc f (TIs) studied, is possible to ordered all considered (TIs), the order is: [Z < W< MTI< ^{1}c < Xu < ^{1}c^{h} ] that is the same order considering log t_{rr} f (TIs) (terms a_{2,1 }to a_{7,1}) This transitivity property is useful to evaluated the correlation of an experimental relation (log t_{rr} ) with topological indices knowing a priori the matrix of r values related to Vc f (TIs) function.

The linear regression (y = mx + n) log t_{rr } versus Vc is statistically very similar to the multiple regression (y = a + bx + cy + dz) log t_{rr } versus independent variables MR, Pcr and Vcr but the great F ratio value indicated a more predictability capacity for linear model. See Table 5

**Table 5.** Statistical results of linear and multiple regression model.

These results, indicate that using the concept of Euclidian distance in space E^{3 }it is possible to reduced multiple regression with three independent variables to a linear regression of the form y = mx + n and in this way solved the problem of orthogonal procedure of factors or to depend of the number points analyzed^{9, 10)} these problems was mentioned in the introduction.

Note. Df: Means liberty grade, f indicated function. Regressions were made by Stat-Graphic Plus 4 Software.

**CONCLUSSIONS**

1. Vc is useful parameter for ordered (TIs) indices in function of its regressions values r respect to GLC relative retention times.

2. Any multiple regressions are possible to reduced to a linear expression by means of Vc parametric idea, only a maximum of three independent variables are permit

3. A very significant linear correlation exist between Vc and ^{1}c^{h } this implies a great dependence between ^{1}c^{h } with critical pressure and critical volume of the hydrocarbons. In fact, this implies a very good significant topological criteria to defined ^{1}c^{h}

**REFERENCES**

1. Z. Mihalic, N. Trinajstic. J. Chem Educ. 69, 701-712 (1992) [ Links ]

2. M. Randic. J. Amer. Chem. Soc. 97,6609-6615 (1975). [ Links ]

3. H. Hosaya. Boll. Chem. Soc. Japan 44, 2332-2339 (1971) [ Links ]

4. H. P. Schultz. J. Chem. Inform. Comput. Sci. 29, 227-228 (1989). [ Links ]

5. B. Ren. J. Chem. Inform. Comput. Sci. 39, 139-143 (1999). [ Links ]

6. C. Yang., C. Zhong. J. Chem. Inform. Comput. Sci. 43, 1998-2004 (2003). [ Links ]

7. E. Cornwell. J. Chil. Chem. Soc. 50, 483-487 (2005). [ Links ]

8. G. Zweig, J. Sherma "Handbook of Chromatography" CRC Press (1976). page 50. [ Links ]

9. M. Randic. J. Chem. Inform. Comput. Sci. 37, 672-687 (1997). [ Links ]

10. J. C. Toplis., R. P. Edwards. J. Med. Chem. 22, 1238-1244 (1979) [ Links ]