Do Large Firms Pursue More Process Innovation? A Case of Canadian Manufacturing Industries

: We test the Cohen & Klepper cost-spreading process share hypotheses using unique data from two national innovation surveys (2009 and 2012). To our knowledge, no other study has the same combination as our dataset, in terms of robust data from a mandatory survey, large sample size, diverse measures for innovation output, and no sample selection bias. We use two direct measures of innovation to test the CK hypothesis: R&D expenditure and the number of innovations. An outcome variable that counts the number of innovations can be easier for respondents to recall from memory and they may reflect the firm’s activities more accurately. Using direct measures of innovation eliminates three forms of bias emanating from patents. Our results show that the CK hypotheses can be supported with the aggregated sample, but the results are weak for separate industries. The count-based process share provides statistically superior results to the expenditure-based process share. This study utilizes two relevant measures of firm-level innovation: the reported number of product and process innovations made within the firm and the total R&D expenditures on process and product innovations. SIBS contains two direct questions about the number of process and product innovations. Question 48 refers to the total number of process innovations (in 2009), in which the new process is defined as improved production process, new or significantly improved logistics/distribution/delivery methods or a new support activity for goods and services. Question 87 relates to the number of new or significantly improved products and services, in which product innovation is defined as new or significantly improved good or service terms capabilities, ease of use, components


Introduction
argument that larger firms are more innovative than small firms (called "Schumpeterian hypothesis"), a steam of research examines the relationship between firm size and the level of innovation. Large firms have advantages in innovation with ample financial and human resources, the complementarities of R&D and other functional activities and their cost spreading advantages. While several studies support the Schumpeterian hypothesis (Scherer 1965;Tsai and Wang 2005;Baumann and Kritikos 2016), other studies show that small firms can prevail in rapid innovation environments because of their flexibility, better communication, and managerial advantages (Stock et al. 2002;Plehn-Dujowich 2009) or the relationship is inconclusive or non-monotonic (Kumar and Aggarwal 2005;Forés and Camisón 2016).
Another stream of empirical research focuses on the role of firm size on the type of innovation, such as process innovation that reduces the cost of producing existing products or product innovation that creates new or significantly improved products (OECD 2005). From a firm's strategic perspective, process innovation emerges from a strategy of price competitiveness through a search for efficiency and production flexibility with new machinery, while product innovation results from a search for technological competitiveness through market expansion and patenting activity (Vaona and Pianta 2008). Pavitt et al. (1987) and Scherer (1991) show that large firms spend a higher proportion of their R&D expenditure on process innovation than small firms.
Based on several empirical findings, Klepper (1996a, 1996b) develop a theoretical model hereafter called the "CK hypothesis" which argues that large firms have proportionally more process innovations than small firms since the costs of process innovation can be spread over larger output. New firms in an emerging product market tend to compete on product innovation, but existing firms increasingly engage in cost-reducing process R&D to create a cost advantage as the firm size grows. Using Scherer's patent data at the business unit level, Cohen and Klepper empirically show (weakly) that the share of process innovation tends to increase with firm size. Several subsequent studies test the CK hypothesis with diverse findings. While Golovko and Valentini (2014) and Choi and Lee (2019) agree with the CK hypothesis in that large firms are more inclined to pursue process innovation, Inkmann (2010) show the opposite result of the positive relation between firm size and share of product innovation. Other studies also find either no systematic relationship or non-linear relationship between firm size and the share of process innovation (Arvanitis 1997;Fritsch and Meschede 2001;Fang et al. 2019).
Several explanations can be suggested for the diverse empirical results of the CK hypothesis. Some results might be due to sample selection issues arising from firms in different growth stages of their life cycle (Klepper 1996;Inkmann 2010), while others are related to the econometric techniques of handling endogeneity and sample selection bias (Inkmann 2010;Baumann and Kritikos 2016). For example, while Fritsch and Meschede (2001) do not find any systematic relationship between firm size and the share of process innovation using the Mannheim innovation data of German manufacturing firms, Inkman (2010) shows a negative relationship with the same dataset by accounting for possible sample selection bias. The most critical reason for the diverse results may be due to the limited sources of data and the choice of different measures of product and process innovation variables (Arvanitis 1997;Fritsch and Meschede 2001;Inkman 2010).
For the process and product innovation variables, Cohen and Klepper use a convenience sample of patent data collected by Scherer (1991) who assumes process patents are those whose industry of use is the same as the industry of origin. They define the share of process innovation as the proportion of patents that have been classified as representing process innovations. However, patent data are coarse measures because they comprise a great deal of noise and underrepresent micro and small firms or firms in service industries which are not active in patenting activities (though active in innovation). As Cohen and Klepper state themselves, expenditure on innovation is a better measure of innovation though the data were not available at the time. Later studies use the Community Innovation Survey (CIS) data which provide such variables as the number of product and process innovations or simply whether they occurred or not (Vaona and Pianta 2008;Golovko and Valentini 2014), new product sales ratio (Fang et al. 2019), or R&D expenditure for process and product innovations (Fritsch and Meschede 2001;Inkmann 2010;Choi and Lee 2018). 1 The objective of this study is to empirically test the CK hypothesis using unique data from two national innovation surveys (2009 and 2012) conducted by Statistics Canada. The data used in this study have presumably better measures of innovation with a large sample size and are collected using stratified random sampling across all Canadian manufacturing firms. Under the authority of the Statistics Act in Canada, the survey is mandatory and the respondents may be contacted directly to re-answer the questions that are inconsistent, unreasonable or contradictory. To our knowledge, no other study has the same combination as our dataset, in terms of robust data from a mandatory survey, large sample size, diverse measures for innovation output, and no sample selection bias. This study uses two direct measures of innovation to test the C&K hypothesis: R&D expenditure and the number of innovations. An outcome variable that counts the number of innovations can be easier for respondents to recall from memory and they may reflect the firm's activities more accurately.
This study shows that the CK hypothesis has weak support in Canadian manufacturing. Industry matters in the CK hypothesis, but unlike Cohen and Klepper most subsequent studies do not explicitly consider industry differences, instead industry dummies are included (Choi andLee 2018, Fang et al. 2019). Our finding could be temporal, meaning that at the time the initial CK study was performed firms had fewer product lines and could therefore spread more costs across output. Our study also shows that the results critically depend on the choice of innovation variables. The share of process innovation using innovation counts is significantly different from the process share using R&D expenditure. Inkman (2010) suggest the importance of carefully chosen variables in the study of rejecting the CK hypothesis, and this study supports his argument.

Cohen and Klepper Hypotheses
Here, we reproduce two of Cohen and Klepper's four hypotheses, which are related to the share of process innovations and firm size. R&D investment on process innovation lowers the average cost, and a firm's profit depends on the size of existing buyers because licensing of process innovation to new buyers is assumed to be not available. For R&D investment in product innovation which generates new product features, on the other hand, a firm can reach new buyers as well as existing buyers. By assuming a specific price-cost margin function, Cohen and Klepper derive the profit maximizing levels of process R&D (r 1 ) and product R&D (r 2 ) within the firm as follows, respectively. (1) (2) where f and g are coefficients, q is a firm's existing output, β i is the rate of decline in marginal return to R&D of type i innovation, h represents the fraction of existing buyers who purchase the firm's new product, and K is additional output gained from sales and licensing to new buyers. Equations (1) and (2) lead to a new variable p which represents the proportion of process R&D relative to total R&D. (3) Taking the first derivative of p with respect to q yields: (4) assuming β = β 1 = β 2 . Equation (4) leads to the first hypothesis.
H1: Within industries, the proportion of process R&D out of total R&D will be an increasing function of the firm's ex ante output.
The second hypothesis relates to the second-order condition of (3), namely assuming β ≤ 1.
H2: Within industries, the faction of R&D a firm devotes to process R&D will rise with the ex ante output of the firm at a decreasing rate.

Data
Statistics Canada designs and administers a national firm-level survey called the Survey of Innovation and Business Strategy (SIBS) which collects information on firms' strategic decisions, innovation activities and operational tactics. The firms selected to respond to the survey are extracted from the business register, and the survey uses stratified random sampling by industry and by size. The response is mandatory by the Statistics Act in Canada, which eliminates sample selection bias. The CEO or senior manager is the target respondent. Answers are provided primarily online via an electronic questionnaire. Non-respondents and respondents with inconsistent or contradictory responses are contacted directly by telephone. This survey also includes very small businesses with fewer than 20 employees and sales under $250,000. Statistics Canada linked the SIBS data to the General Index of Financial Information (GIFI) which contains income tax data of each firm surveyed, as well as total sales. 2 1 For the measure of firm size, some studies use the total sales (Cohen and Klepper 1996;Choi and Lee 2018), while others use the number of employees (Fritsch and Meschede 2001;Baumann and Kritikos 2016;Fang et al. 2019). 2 The information on the SIBS can be found in http://www23.statcan.gc.ca/imdb/p2SV.pl?Function=getSurvey&SDDS=5171 and that for GIFI in http://www.cra-arc.gc.ca/tx/ bsnss/tpcs/crprtns/rtrn/wht/gifi-ogrf/menu-eng.html. This study utilizes two relevant measures of firm-level innovation: the reported number of product and process innovations made within the firm and the total R&D expenditures on process and product innovations. SIBS contains two direct questions about the number of process and product innovations. Question 48 refers to the total number of process innovations (in 2009), in which the new process is defined as improved production process, new or significantly improved logistics/distribution/delivery methods or a new support activity for goods and services. Question 87 relates to the number of new or significantly improved products and services, in which product innovation is defined as "a new or significantly improved good or service in terms of its capabilities, ease of use, components or subsystems. " This study considers NAICS (North American Industry Classification System) 31 which contains all manufacturing industries. After cleaning the data, there are 2,990 firms for SIBS 2009 and 2,617 firms for SIBS 2012, with a total of 5,077 firms. 3 To be consistent with the methodology used by Cohen and Klepper, this study uses an average of the three prior years of sales as firm size, which eliminates any bias from unknown operational changes, product line changes, personnel changes, etc., within the firm.
Cohen and Klepper use a convenience sample with patents as the dependent variable because "direct measures of firm process and product R&D expenditures within industries are not available". Patents allocated to process or product patents by Scherer's (1991) approach are subject to three forms of bias. The first is the subjective bias introduced by the classifier and their own idiosyncratic classification system, the second downward bias is that not all innovations are patented, lastly there is further downward bias since most process innovations remain trade secrets (Levin et al. 1987). Cohen and Klepper argue that this problem does not introduce bias so long as the patent propensity in each industry is unrelated to size. However, the simplest way to rectify this problem is to collect the number of process and product innovations at the firm level. This solution was unavailable in 1996 but is available with our data.
We construct two dependent variables-the share of the number of process innovations relative to both process and product innovations, and the share of R&D expenditure on process innovations relative to R&D expenditure on both process and product innovations. Table 1 shows that the average process share is 34.6% for the count variable and 42.5% for the expenditure variable. This share is substantially larger than the share used by Cohen and Klepper (1996b) Table 2 summarizes the process share for 19 three-digit NAICS industries for the two dependent variables. The term "vetted" in the table indicates that the particular cell does not satisfy Statistics Canada's disclosure criteria. This table reports the share of process innovation based on only innovating firms, as indicated in column three. If we include non-process innovators in Table 2 we will give biased (downward) information. Why report an average cost or average number of innovations across all firms? This will distort the avc of expenditure, and the mean number of innovations. Also, CK results only report patents. If a firm has a patent it is included, if not it is excluded. It is very conceivable that a firm that has developed product innovations could have zero process innovations, in which the share of process innovation would equal zero. Or, if the firm has developed only process innovations, the share of process innovation would equal one. Table 2 shows that there are 3,850 firms with at least one innovation. Due to sample size reporting requirements exercised by Statistics Canada, some industries were grouped together in the table.
In Table 2 we see the striking discrepancy between the process share based on the number of innovations (column 6) versus that based on R&D expenditure (column 9). The average share of the former measure (39%) is much lower than the average share of the latter measure (47%), and the difference exceeds or is equal to 0.25 in five industries (differences are in brackets): computers and electronics (0.28), transportation equipment (0.26), food-beverage and tobacco (0.35), textile and textile mills (0.25). This implies the choice of innovation variables may critically affect the results in empirical analysis.

Estimation and Results
We test H1 using a linear function with an intercept and sales as the independent variable, and H2 with a nonlinear function with an intercept, sales and sales squared as independent variables. We recognize that endogeneity (simultaneity bias) is an issue with firm size and innovation, however, this is a replication study. The original paper made no mention of simultaneity bias and thus to ensure comparability we do not correct for it. We first estimate the model for the whole sample, and then for each NAICS 3-digit manufacturing industry. Cohen and Klepper use a double-censored Tobit estimator to accommodate a dependent variable bounded by zero and one, but the Tobit estimator has fitted values that are not constrained to be within the zero to one interval. Instead, we use a Generalized Linear Model with a logit link function from the binomial family, which ensures that the fitted values fall between zero and one. The CK patent data necessarily suffer from upward bias because their data were collected from firms that had patents, whereas firms without patents were omitted. Our study uses all firms in the sample frame for estimation. Cohen and Klepper address the issue of heterskedasticity by adding a weight to each business unit and focusing on industries with large numbers of patents. In contrast, we use probability weights from the sampling design, which results in a robust weighted variance calculation. Heteroskedasticity is corrected by reporting weighted Huber/White standard errors. Because the propensity to innovate varies considerably by industry, our results could be biased by heavily innovative industries, thus standard errors for the aggregate regressions are clustered by industry.
Tables 3 and 4 report the estimation results when the process share is measured with the number of innovations and R&D expenditure, respectively. Some industries were grouped together for estimation; this is due to Statistics Canada's disclosure requirements. Rather than discuss the estimation results in detail below, our focus will be on the marginal effects of sales on process share. The CK hypothesis for all manufacturing is supported for process share in terms of the number of innovations (not R&D expenditure). For separate industries Tables  3 and 4 offer mixed statistical support for H1. Under the heading "Linear specification" in the column "Margin on sales, " we see for countbased process share (Table 3) there are three significant margins out of 19 industries, with one displaying the wrong sign. In comparison, in the Cohen and Klepper linear function, there are four statistically significant coefficients out of 36 industries. In Table 4 under the heading "Linear specification" in the column "Margin on sales, " expenditure-based process share has five out of 19 industries that are significant. However, four of the five margins are negative, and thus have the wrong sign.
H2 states that the process share will rise with output at a decreasing rate, which implies the coefficient of the square of sales is negative.
In the columns of Tables 3 and 4 where "0" occurs, Statistics Canada did not release the coefficient, other than to note it was very small. For the count-based process share (Table 3) under the heading "Nonlinear specification" in the column labelled "Margin on sales" there are five margins that are significant and positive (the correct sign) and nine margins for in the column labelled "Margin on sales 2 " that are significant and negative (the correct sign). For comparison, the nonlinear specification from Cohen and Klepper has only one statistically significant coefficient on sales squared out of 36 industries. Turning to the nonlinear specification in terms of expenditure-based process share (Table 4) under "Nonlinear specification" in the column labelled "Margin on sales" three industries have a significant margin, two of which are the wrong sign. In the column labelled "Margin on sales 2 " five margins are significant, of which four are the wrong sign. The results imply that the veracity of the CK hypotheses critically depend on the choice of innovation variables and on the industry. Cohen and Klepper point out the difference of innovation type by industry due to product complexity, and this study shows the sensitivity of the hypotheses by industry.

Conclusion
Our study tests the Cohen and Klepper patent-based process share hypotheses with more comprehensive data collected by Statistics Canada. We collected two measures of innovation variables in terms of the number of innovations and R&D expenditure. The empirical results show that the CK hypotheses can be generally supported with the aggregated sample, but the results are weak if we consider separate industries. Cohen and Klepper indeed show that the hypotheses at the individual industry level are weakly supported in only a few industries, and this study bolsters the results with more comprehensive data. Indeed, there exist diverse differences in the results between the two dependent variables, namely the count-based process share provides statistically superior results to the expenditure-based process share. An empirical question for future research is "Why don't large firms have a higher proportion or process innovation?" There is clearly something else happening in the firm contrary to what the CK hypotheses predict.