Predicting Success in Product Development: The Application of Principal Component Analysis to Categorical Data and Binomial Logistic Regression

Critical success factors in new product development (NPD) in the Brazilian small and medium enterprises (SMEs) are identified and analyzed. Critical success factors are best practices that can be used to improve NPD management and performance in a company. However, the traditional method for identifying these factors is survey methods. Subsequently, the collected data are reduced through traditional multivariate analysis. The objective of this work is to develop a logistic regression model for predicting the success or failure of the new product development. This model allows for an evaluation and prioritization of resource commitments. The results will be helpful for guiding management actions, as one way to improve NPD performance in those industries.


Introduction
In the management of NPD, the identification of critical success factors (CSFs) that contribute to increasing the probability of success of the new product is a traditional line of research with numerous reference studies (Cooper and Kleinschmidt, 1995;Song and Parry, 1996;Souder et al., 1997;Song et al., 1997;Ernst, 2002;Cooper et al., 2004a;Kahn et al., 2006;Song and Noh, 2006;Barczak et al., 2009;Kahn et al., 2012).
The traditional method for identifying these factors is the use of surveys in which data are collected with respect to a large set of potential variables.Subsequently, these data are reduced using a traditional multivariate analysis, correlating the specific factors to the success of the new product (Cooper and Kleinschmidt, 1995;Song et al., 1997;Song and Noh, 2006).
Based on the assumption that the literature concerning the NPD critical factors is thoroughly explored, the present study aims to go one step further by proposing a quantitative model capable of predicting the result of the new product.Therefore, this study seeks to fill a gap in the published literature, where there is a lack of studies utilizing more sophisticated techniques to predict the success or failure rate in NPD (Berkowitz et al., 2007).Sophisticated techniques can be used to identify those models able to deal with problems involving a large number of variables of different types.The large number of variables reflects the need for the development of multivariate statistical models capable of dealing with problems related to a wide variety of controllable and uncontrollable factors, which may lead to the success or failure of NPD.
The present study proposes a multivariate statistical model to assist project managers in predicting the outcome of new products.The use of this model may probabilistically classify a new project as either a success or a failure.Simulations, such as "what would happen if", can be performed to identify critical areas for the efficient allocation of resources.
Classification problems are often found in many fields.In business management, the most common applications are in finance (Chen, 2011), marketing (Kaefer et al., 2005) and human resource management (Sexton and Mcmurtrey, 2005).The strong interest in classification problems has motivated many researchers (Duda and Hart, 2001;Bernadó and Garrell, 2003) to develop quantitative methods for this purpose.Linear discriminant analysis (LDA) (Johnson and Wichern, 2002) was the first method developed to solve classification problems from a multivariate perspective.In addition to LDA, other multivariate statistical tools (Flury and Riedwyl, functional project structures.The benefits of more organic structures (pure and matrix) have been reported (Larson and Gobeli, 1988).However, other studies (Lee et al., 2000;Yap and Souder, 1994) suggest that adopting a contingency perspective.Barczak et al. (2009) suggest that companies with superior performance in NPD have formal processes that are divided into stages and activities.Many studies (Cooper and Kleinschmidt, 1995;Brown and Eisenhardt, 1995;Cooper et al., 2004c) have indicated that the quality of execution of some NPD activities has a positive influence on the new product success.Among these activities, the fuzzy front end (FFE) activities are highlighted (Griffin, 1997;Ernst, 2002;Kahn et al., 2006).The activities of the idea generation, technical and marketing studies and the new project viability (technical and economic) are the FFE activities that contribute most to the NPD success.

Multivariate data analysis
Survey data analysis requires special attention from the researchers, due to the considerable amount and types of measures used to assess the constructs used in the research conceptual model.The amount of variables reflects the multivariate aspect of the research problem and, therefore, the conceptual model proposed to measure it.The typology reflects the classification of the indicators (quantitative or qualitative) of these constructs.The treatment of the multivariate data is a relevant contribution to the construction and expansion of theories by identifying the complex causal relationships between the variables and constructs involved (Devellis, 2012).Such techniques as cluster analysis, principal component analysis, factorial analysis, discriminant analysis, structural equation modeling and others have been used in various research fields, including biology, medicine, agronomy, engineering, business and social and behavioral sciences (Manly, 2005;Lattin et al., 2011;Brown et al., 2011;Devellis, 2012).One of the merits of these techniques is the identification of specific and precise issues of considerable complexity within a data set, transforming multidimensional information into two-or three-dimensional information (Manly, 2005;Hair et al., 2006;Lattin et al., 2011).
The multivariate analysis techniques can be classified into two groups, including: 1) dependence techniques in which one or more variables can be explained by other independent variables and 2) interdependence techniques, in which no variable or set of variables are treated as either dependent or independent.Dependence techniques include multiple regression, structural equation modeling, discriminant analysis and conjoint analysis.Interdependence techniques The alignment among the NPD strategy, the new products projects and the main company strategies is a success factor (Cooper and Kleinschmidt, 1995;Song and Noh, 2006;Acur et al., 2012).Marketing skill is the ability of a company to detect, assess and use appropriate information about the customers, markets, competitors and external environmental forces.A market-oriented approach has been identified as a success factor (Cooper and Kleinschmidt, 1995;Song et al., 1997;Langerak et al., 2004;Haverila, 2010).
The technology strategy also must be linked to the company strategy to address technical capabilities and market opportunities during the NPD (Zapata and Cantú, 2008).A decision in the technology strategy is the selection of the technology sources.According to Scott (1999), the technology sources can contribute to the success or failure of a new product because they require different capabilities of companies with regards to the acquisition, adaptation, management and integration of the given technology with systems and people.
Company skills are defined as distinctive capabilities that assist the company in the execution of the NPD process, directly interfering in the quality of the tasks performed.The technical and marketing skills of the company have an influence on the success of the new product (Song et al., 1997;Song and Noh, 2006).
The roles of key individuals are also frequently cited as success factors.Among these factors, the project leader skill and the top managements support are highlighted (Brown and Eisenhardt, 1995;Lee et al., 2000).The project leader performs important tasks, such as facilitating communication between the project team members and top management, negotiating the resources allocation and seeking to keep the project team members motivated and focused on their responsibilities (Thieme et al., 2003).The top management also plays an important role because it is responsible for the strategic direction of NPD and the allocation of human and financial resources (March-Choda et al., 2002).
The organization of NPD is recurrent in studies on critical factors for NPD.The emphasis is on the cross functional integration and on the organization of project teams (Lee et al., 2000;Ernst, 2002).Several studies (Griffin, 1997;Souder et al., 1997;March-Chordà et al., 2002;Cooper et al., 2004a;Sherman et al., 2005;Jugend and Silva, 2012) suggest that cross functional integration has a positive influence on the new product outcome.Integration between the NPD and marketing has a positive effect on the new product success (Yap and Souder, 1994).
There are several different organizational structures for the project team.The main structures are the pure, matrix or caused by the assumption that the probability distribution of the categorical variables, especially the nominal variables, is known.Therefore, methods for the extraction of factors, such as weighted least squares (WLS) or unweighted least squares (ULS), are used in exploratory factorial analysis because these approaches do not require the use of prior knowledge regarding the probability distribution of the variables (Flora and Curran, 2004;Johnson and Wichern, 2002).However, the methods for extraction of the factors for categorical variables still require further study (Jöreskog and Moustaki, 2001;Wirth and Edwards, 2007;Forero et al., 2009).
Accordingly, principal component analysis was selected as a method for reducing the variables to be considered as independent variables in a logistic regression model.However, to resolve the issue of inapplicability in the categorical principal component analysis (CATPCA), the principal component analysis was used with optimum nonlinear design for nominal and ordinal data (Meulman et al, 2004).
Generally CATPCA assigns numerical quantifiers to each of the categories of qualitative variables, thereby allowing later analysis of the main components of the transformed variables (Meulman, 1992;1998).The numerical values assigned to each class of the original variables are defined by an interactive procedure, which is known as the alternative least squares method, such that the numerical quantifications possess metric properties (Moroco, 2003).
While traditional PCA assumes linear relationships between variables, CATPCA allows for the measurement of nonlinear relationships between variables.Moreover, CATPCA does not require that the variables be normally distributed.Thus, CATPCA also can be considered as a method to reduce the data size (Meulman, 1992;1998;Meulman et al., 2004).
LR is a multivariate statistical technique used in situations that require the prediction of the occurrence or nonoccurrence of a certain characteristic of the response variable from a set of independent variables.LR is similar to the linear regression model but is set apart by its dependent variable type, which can be categorical, binary or multinomial (Harrel, 2001;Hosmer and Lemeshow, 2000).Additionally, LR does not make certain basic assumptions may consist of factorial analysis, cluster analysis, principal component analysis, correspondence analysis and multidimensional scaling (Hair et al., 2006).
Although most multivariate techniques provide more accurate and reliable results when variables are quantitative (discrete or continuous), this property does not mean that such techniques as cluster analysis and factor analysis cannot be used as qualitative variables (categorical or nominal).However, the relevant literature (Hair et al., 2006) advises caution in the customary use of these multivariate techniques for qualitative variables.
The purpose of the present article is to predict the success or failure in NPD.Therefore, the general hypothesis was established that a multivariate statistical model is an interesting alternative to the prediction of the new product outcome (Bertrand and Fransoo, 2002).In situations that include categorical variables both in the dependent and independent variables, the use of the logistic regression model is recommended (Hair et al., 2006;Lattin et al., 2011).

Categorical Principal Component Analysis
One of the techniques applied to reduce data and enable treatment with other multivariate techniques, such as the logistic regression analysis, is principal component analysis (PCA) (Krishnakumar and Nagar, 2008).PCA seeks to construct a new set of variables, called principal components, that are less numerous than the original data, but still adequately summarize the information contained in the original variables.Each component is a linear combination of the original variables and seeks to reproduce the maximum variance of the original data.In PCA, there is no underlying explanatory model (Lattin et al., 2011).This PCA is predominantly used in the analysis of numerical variables.However, numerous research methods in social sciences and operational management utilize qualitative variables in the proposition of constructs (Devellis, 2012;Brown et al., 2011).In these cases, multivariate statistical techniques are applied, such as multiple correspondence analysis, multidimensional scaling, factorial analysis, logistic regression and structural equation modeling (Hair et al., 2006;Brown et al., 2011;Devellis, 2012).
Many studies (Jöreskog and Moustaki, 2001;Flora and Curran, 2004;Vermunt, 2007;Wirth and Edwards, 2007;Forero et al., 2009) reveal the use of new procedures of factor analysis in categorical variables.These new procedures are primarily related to the method used to extract the factors.Frequently, in the application of factor analysis, researchers mistakenly employ the method for factor extraction by principal components in categorical variables, which is an error survey (Forza, 2002) and empirical modeling (Bertrand and Fransoo, 2002).The research steps were based mainly on those indicated by Forza (2002) for surveys.

Sample characteristics
The population of the companies studied is shown in Table 1.
In situations where the population is considered to be small, it is recommended that the sample constitute at least 50% of the population (Yamane, 1967).In the present survey, 62 companies were randomly selected, amounting more than 50% of the population, and this sample size is therefore considered satisfactory.
The analysis units in the survey were projects of new product developed by the companies in the past five years.Interviews were conducted with the NPD responsible (managing partner, managers or engineers).The objective was to explore two types of new product projects in each company: one success project and one failure project.
After discussions with the NPD project leaders, the target projects were defined with the specific personnel who would be interviewed.All responses should be based on records, facts and situations encountered during the product development.The interviews were conditioned according to the level of knowledge and responsibility assumed by the respondent during the project execution.In the interviews, a structured questionnaire was adopted as an instrument of data collection.
Data were collected from 112 projects, of which eight were excluded due to missing data.Thus, data were obtained for 104 projects, including 62 successful projects and 42 failures ones.This sample is consistent with the recommendations of Hair et al. (2006), who advised a minimum of 100 cases to ensure more robust results.This proportion was important to define the cutoff point in the procedures for the application of logistic regression.In this case, the cutoff value was established as 0.6.that are typically adopted in other models, as mentioned in the introduction.
The LR model for p independent variables can be written using expression 1, where P(Y = 1) is the probability of success and B0, B1,…, Bp are the coefficients of the regression model. (1) There is a linear regression model hidden within the logistic regression model.The natural logarithm of the ratio P(Y=1) to (1-P(Y=1)) provides a linear model in X_i, as can be observed in expression 2. (2) The function g(x) has many desirable properties of a linear regression model.An advantage of the LR with respect to linear regression models is that the independent variables may be a combination of continuous and categorical variables.In the present study, the binary logistic regression was justified by the fact that all independent variables are ordinal categorical variables, and the dependent variable is binary, i.e., represents the probability of success or failure in product development.

Method
The research methods adopted in this article can be considered explicative for identifying NPD success factors, and it can be considered predictive for assuming that a multivariate statistical model is able to predict the new product outcome.The technical procedures included the research The questionnaire was structured according to the constructs shown in Figure 1.Each construct was split into the independent variables, as related to successful practices in NPD managing.These variables were presented in the questionnaire in the form of affi rmative statements.The perception of the respondents regarding the degree of compliance of the practices adopted in these projects was measured on a fi ve-point ordinal categorical scale (1-strongly disagree to 5-strongly agree).To classify the project, a fi ve-point Likert scale was also used.In cases where performance equaled or surpassed expectations they were classifi ed as successful.
The failures projects corresponded to products with performance below expectations.
The questionnaire was pretested in visits in four companies.Adjustments were made to facilitate the understanding of the constructs and individual variables and the scales used to measure each variable.The internal validity of the constructs was measured using Cronbach's alpha coeffi cient (Cronbach, 1951), which represents the reliability of the measurement scale for the constructs adopted.A coeffi cient greater than 0.7 represents satisfactory reliability of the measurement scale (Nunnally, 1978).The lowest of Cronbach's alpha coeffi cient of the constructs was 0.88, which is considered satisfactory.

The questionnaire and conceptual model
The questionnaire was based on the conceptual model shown in Figure 1, which was used to guide the present study.This conceptual model was based on the models of Brown and Eisenhardt (1995), Song andParry (1997), andSouder et al. (1997).The conceptual model suggests that the product strategy, marketing skills, technology sources, company/business skills, project leader skill, functional integration, organization of the project team and quality of execution of NPD activities are constructs of NPD practices (factors) that infl uence the new product outcome.
The dependent variable of the model is the perceived success of the new product, for example, the result from a comparison of the company expectations with the actual performance of the new product after its launch, considering fi nancial aspects, market share, brand strengthening and the development of new competencies for the company.The independent variables are the practices (individual variables) that form the constructs of the model.
Figure 1.The conceptual model adopted in this study variance was 50%, which are values considered satisfactory for reducing the number of variables.Another question refers to the index of the eigenvalue.Most of the components had eigenvalues greater than 1.0, representing high explanatory power of the indicators in relation to variance.This behavior reinforces the Kaiser criterion, which recommends adopting only those components with eigenvalues greater than 1.0 (Kaiser, 1974).
Table 3 summarizes, for each construct, the variables selected by the CATPCA and the respective loads of the components.Only the components that reached at least 50% of the explained variance and components that had a load greater than 0.8 were selected.
The rules adopted in implementing the CATPCA permitted a reduction in the number of variables by 28%.These variables were included as a first step in the development of the logistic regression model and they are highlighted in the discussion section.

Logistic regression in the prediction of NPD success
The stepwise method was selected for development of the logistic regression model.Among the stepwise method options, the forward LR (likelihood ratio) was selected.The model was adjusted after four steps.The omnibus test was used to assess the significance of each step, indicating that all steps were significant (p <0.05) for the model development, as can be observed in Table 4.
The explanatory power of the logistic regression model in each stage is illustrated in Table 5.As can be observed, the stage 4 model reached an overall success rate of 89.4% with a particularly high success rate for the projects classified as successful.

Data reduction
CATPCA was used to define the number of variables required to initiate the development of the logistic regression model.The same criteria used to develop the traditional PCA were used in the CATPCA.Important criteria include the database suitability (the number of cases and the correlation pattern between the variables), the definition of the number of components for each construct and the percentage of variance explained by these components.Table 2 illustrates the criteria adopted for implementation of the CATPCA.
Another indicator of the sample adequacy is the KMO test, which measures the degree of correlation between variables.The KMO test varies from 0 to 1, and higher values are considered better.Hair et al. (2006) suggest 0.50 as an acceptable level.Considering Table 2, one can observe that all constructs showed a KMO above 0.5, except for the Organization of the Project Team.In this case, all of the variables in this construct were initially included in the logistic regression model.Bartlett's test of sphericity (BTS) showed a significance value less than 0.000, rejecting the existence of an identity matrix.Based on this scenario, the data sample was considered adequate for implementation of the CATPCA.
To define the number of components, the percentage of variance explained by the components was considered.The value of the variance accumulated by the components was adopted as an acceptable index if equal to or greater than 50%.Table 2 shows that the amount of components extracted was not greater than two and the minimum accumulated Where: KMO (Kaiser Meyer Olkin) and BTS: Bartelett Test of Spherecity (p<0,05)

Variables
Loads of the components 1 2 1.1 Evaluation of the market potential for this project was well-conducted 1.0 -1.2The consumers/clients greatly desired this type of product 0.9 -1.3 User requirements were understood and translated for the product specifications 0.9 -2.1 The product offers the same solutions as the competitors, but with the advantage of a lower

5.2
The project leader had the interpersonal skills needed 0.0 -5.3The leadership style adopted by the project leader was suitable for its execution, encouraging communication and conflict management 1.0 -5.4The leadership style allowed for participation of team members 1.0 -5.5 The staff development team was motivated to execute the project 1.0 -6.1 The project included participation from various areas/departments in conducting the technical development activities (product design) 0.9 -6.2In the project, there was an appropriate degree of integration between manufacturing and R&D 0.9 -6.3The project included participation from various areas/departments in conducting the activities of generation and selection of ideas 0.9 -6.4In the project, there was an appropriate degree of integration between Commercial and R&D 0.8 -6.5 The project included participation of various areas/departments in conducting the feasibility analysis activities 0.8 -6.6The project included participation of various areas/departments in conducting the testing activities of the product/market -1.0 6.7 The project included participation of various areas/departments in conducting the prototype development activities -0.9 7.1 The project activities were performed using a functional structure.The variables selected by the model in each step are described in Table 6.
The significant variables are those in which the equation coefficients are non-zero.The column Exp (B) is the odds ratio, which assesses the contribution of each variable in the classification; in the case of the present study, it was the successful projects.
As new variables were entered into the model, there was an improvement in its ability to adjust, given that the statistic -2LL exhibited successive reductions up to step 4. Estimates of R2 indicated that the model of step 4 was probably the best because both statistics exhibited the highest values of R2, with 58% being observed for the Cox and Snell estimate and 79% for the Nagelkerke estimate, as can be observed in Table 7.
them into new product specifications, (ii) articulation of the new product project to the company's strategy (product and competitive strategies), (iii) relationships with technology suppliers and (iv) the generation and selection of ideas for new products.
The first significant variable is related to the ability of a company to identify and translate customer needs into product requirements and specifications.This finding is in agreement with previous studies (Souder et al., 1997;Cooper et al. 2004c;Kahn et al., 2006) because successful projects are those in which the market assessments were Another statistic used to evaluate the goodness of fit of the model is the Hosmer-Lemeshow test, which must be greater than 0.05 for a good fit of the model to the data.In this case, the test indicated a significant statistical fit of the model in the four steps performed, as shown in Table 8.

Discussion
In analyzing the individual variables that are important to the conceptual model, four factors (individual variables) are highlighted as significant, including (i) the ability of the company to understand the customer requirements and turn *(p<0,05); ** (p<0,1).Therefore, successful projects are those in which there was a strategic alignment of the project, the market assessments were well-conducted and the user requirements were correctly translated into the new product specifi cations, as seen in Figure 2.This methodology requires an emphasis on the FFE activities that must be well-executed and includes the participation of external sources for technological innovation.Therefore, the signifi cant variables identifi ed by the forecasting model provide insights such that NPD managers and academics focus their attention on variables related to the constructs, such as the product strategy, the target market, the quality of NPD activity implementation and the technology sources.

Final considerations
The new product development process is critical for the company survive.With so many practices affecting the new product success, it is hard to predict the launch of successful new products.Given the importance devoted to NPD activities, it is critical that NPD managers focus on best practices to help ensure commercial success.This paper presented a method that NPD managers can use to predict the new product success prior to launch, allowing the effi cient resource and commitment allocation.
The conceptual model incorporated a number of factors that were considered critical for NPD management.Certain factors were statistically confi rmed; however, others did not signifi cantly infl uence the outcome of the new product, although they are cited in the bibliography.This fi nding highlights practices that predict success in launching new products.
well-performed and the requirements of users were correctly translated into the new product specifi cations, which reinforces the importance of the construct marketing skills and denotes the need for greater quality of the execution of FFE activities.
The alignment between the new product strategy and the competitive strategy of the company is one a critical factor for the new product success.Four statements that address the adequacy of long-term strategies and project strategies, the average classifi cations attributed to successful products were statistically greater than the average classifi cations assigned to products not achieving success.The alignment between the products to be developed and the competitive strategies of the company was indicated as a success factor, which is in agreement with the studies of Clark and Wheelwright (1993) and Cooper et al. (2004a).
The third signifi cant individual variable recognizes the importance of innovation by incorporating and adapting technologies obtained from external sources, particularly companies with commercial relationships (suppliers of machinery, equipment, materials, components or software).This approach is more common in the small and medium-sized technology-based companies as the companies that were considered in the present study.
The activities associated with the generation and selection of ideas must be carefully managed in NPD.These activities highlight the importance of FFE activities, as indicated by several authors (Kahn et al., 2006;Cao et al. 2011).The management of FFE has a signifi cant impact on the performance indicators of cost, project quality and time to project development and product launch.Further, the high-quality implementation of activities for generating and selecting ideas can facilitate understanding of the characteristics desired There are several limitations of this research.It is important to note that the study was conducted in specific sectors, which, in this case, were medical devices and process automation devices and, therefore, the results cannot be generalized to all types of industries.A second limitation is the necessity of measuring the causal relationship between different NPD practices and the new product outcome.
From the perspective of mathematical models and multivariate statistics, the use of principal components analysis for categorical data is indicated as an important tool for reducing the number of variables constituting the model prescribed by logistic regression, which was adequate in predicting the success of NPD based on constructs with strictly qualitative measures of performance, obtaining a classification power of approximately 90%.The combined use of mathematical techniques and multivariate statistics was demonstrated to be an indispensable tool in the present study, and this approach should be encouraged within both the academic and professional markets.
In future studies, other multivariate techniques, such as artificial neural networks, fuzzy logic and classification trees, could be used to test and compare their respective classification powers in predicting the NPD success.
product was well articulated with competitive strategies for the product and the company of third-party staff to fill the need for skills not existent in the company 1

Figure 2 .
Figure 2. The constructs confi rmed by logistic regression

Table 2 .
Criteria of the development of CATPCA

Table 3 .
The component loads by constructs and indicators

Table 4 .
Omnibus Tests of model coefficients

Table 5 .
The classification power of the models

Table 6 .
The variables included in the model

Table 7 .
The statistics of each step