Rev. signos vol.55 no.110 Valparaíso dic. 2022 

Sección Monográfica

Descriptive Norms for 1,082 Chilean-Spanish Idiomatic Expressions

Normas descriptivas para 1.082 locuciones del español en su variante chilena

Begoña Góngora1 

Andre Gómez-Lombardi2 

Alonso Ortega González3 

1Universidad de Valparaíso, Chile,

2Universidad de Valparaíso, Chile,

3Universidad de Valparaíso, Chile,


Idioms are formulaic expressions that vary in a number of dimensions or psycholinguistic factors. It is suggested that these factors may modulate the way in which the cognitive system stores, accesses and retrieves these expressions from memory. We obtained the normative data of 1.082 Chilean Spanish idioms. A total of 622 volunteers from 18 to 83 years old participated in this research, who ranked the degree of familiarity, ambiguity, transparency and composition of the idioms. Internal consistency, inter-rater reliability, correlation and ANOVA analyses were conducted to evaluate the effect of age on the scores of each psycholinguistic dimension. Results showed a high internal consistency for the linguistic dimensions of familiarity, ambiguity and compositionality. However, inter-rater reliability scores were low for all groups and dimensions. Correlational analyses showed positive and significant coefficients among all the linguistic dimensions. Finally, significant differences were observed between all age groups in every psycholinguistic dimension. Results are interpreted in terms of its relevance to the study of figurative language processing.

Keywords: Idioms; psycholinguistic porms; familiarity; compositionality; transparency; ambiguity


Las locuciones son expresiones idiomáticas que varían en una serie de dimensiones o factores psicolingüísticos. Se sugiere que estos factores pueden modular la forma en que el sistema cognitivo almacena, accede y recupera estas expresiones de la memoria. Se obtuvieron los datos normativos de 1.082 expresiones idiomáticas del español chileno. En esta investigación participaron 622 voluntarios de entre 18 y 83 años, que calificaron el grado de familiaridad, ambigüedad, transparencia y composición de las locuciones. Se llevó a cabo un análisis de la consistencia interna, de la confiabilidad interevaluador, un análisis de correlación y un ANOVA para evaluar el efecto de la edad sobre las puntuaciones de cada dimensión psicolingüística. Los resultados mostraron una alta consistencia interna para las dimensiones lingüísticas de familiariadad, ambigüedad y composicionalidad. Sin embargo, la confiabilidad inter evaluador fue baja en todos los grupos y todas las dimensiones. Los análisis de correlación mostraron coeficientes positivos y estadísticamente significativos entre todas las dimensiones lingüísticas. Finalmente se observaron diferencias estadísticamente significativas entre todos los grupos de edad para cada una de las dimensiones psicolingüisticas. Los resultados son interpretados en términos de su relevancia para el estudio del procesamiento del lenguaje figurativo.

Palabras Clave: Locuciones; normas psicolingüísticas; familiaridad; composicionalidad; transparencia y ambigüedad


The present study is framed in the field of language processing and, in particular, in the interpretation of phraseological units called phrases. According to Corpas Pastor (1996), phraseological units are very frequent in everyday speech and their learning constitutes a relevant milestone in language acquisition. Likewise, Ruiz Gurillo (2001) indicates that phrases are the most common form of idiomatic figurative language. For this reason, Sprenger, Levelt and Kempen (2006) point out that all models of language processing should account for how the cognitive system deals with these expressions and what mechanisms are involved in their interpretation.

However, this represents a great challenge given that sentences are not homogeneous expressions (Papagno & Cacciari, 2003). Indeed, this type of phraseological units can present a diversity of grammatical configurations and present a high variability with respect to their constituent linguistic features: familiarity, compositionality, syntactic fixity, transparency and ambiguity (Oliveri, Romero & Papagno, 2004).

How these factors modulate the processing of utterances is still a matter of research and, therefore, it is essential that these features be typified for each of the languages. With this in mind, this study seeks to develop a descriptive norm for 1,082 idiomatic expressions of Spanish in its Chilean variant.

1. Theorical Framework

As broadly known, languages are not only determined by the rules governing the use of free syntagmas, but also by the presence of prefabricated structures that speakers use in their linguistic productions (Corpas Pastor, 1996). An example of this kind of structures are the idioms, which are traditionally defined as:

“stable combinations of two or more terms, which work as a sentence element and whose unitary sense is not a direct function of the meanings of their components” (Casares, 1992: 170).

There are several typologies of idioms, which focus on different aspects. Some focus on the internal features of idioms, taking into account aspects such as the motivation of the units, their fixation, and idiomaticity (Tristá, 1988). However, there are other typologies that focus on morphological and functional aspects, such as the one proposed by Casares (1992), which distinguishes between conceptual idioms (i.e., expression consisting of one or more element with a semantic meaning like “break the ice”) from connected idioms (i.e., expression whose main function is to establish syntactic links. An example from Spanish are “con tal que”, “en pos de”). Other typologies, such as that of Corpas Pastor (1996), are based on the syntactic structure of the expressions, and classifies them into nominal, adjective, adverbial, and verbal idioms. It is important to emphasize that some of these expressions can be part of the figurative resources available to the speakers of a community, since they can express something that goes beyond the meaning of its components (Gibbs & Colston, 2012). The latter explains why standard models of language comprehension -which rely on semantic compositional mechanism - fail to provide an appropriate account of how these expressions are processed (Caillies & Butcher, 2007; Tabossi, Arduino & Fanari, 2010; Gibbs & Colston, 2012). As a matter of fact, since the 70s-decade, specific models have been proposed to explain how idiomatic expressions are recognized and produced by the cognitive system, which can be grouped into three main approaches: a) non-compositional, b) compositional and c) hybrid.

The non-compositional approach states that idioms are stored as a ‘whole chunk’ or as a ‘single word’, in either the general lexicon or in a separate idiomatic lexicon. Consequently, the notion of lexicalisation plays a central role in this approach (Bobrow & Bell, 1973; Swinney & Cutler, 1979; Gibbs, 1985). However, the main disadvantage of non-compositional models is that they ignore that certain idiomatic expressions can be analyzed (Gibbs, Nayak & Cutting, 1989) and do not capture the syntactic flexibility or the internal semantic structure that certain idioms exhibit (Titone & Connine, 1999). Furthermore, Burt (1992) suggests that lexicalisation is a premature idea that leaves many unsolved aspects. For instance, lexicalisation does not account for how idioms are associated with the information that has been previously stored in memory, nor how or when the analysis is suspended when the idioms are recognized when facing the first constituent. Finally, according to Burt (1992), the access to an idiom from a lexical entry is not necessarily a simple or cognitively economical process, since it implies activating a series of mechanisms (i.e recognition of each letters from the potential initial word of an idiom and if an idiom is found it is necessary to check the match between the letter strings and the idiom stored in memory, among others) that are time consuming.

Therefore, considering the drawbacks of the non-compositional approach on idioms processing, a compositional view has been proposed, claiming that both literal and figurative processing involve the same underlying mechanisms (Cacciari & Tabossi, 1988; Gibbs, Bogdanovich, Sykes & Barr, 1997). Titone and Connine (1999) state that a compositional approach is consistent with psycholinguistics studies showing that the meaning of idioms’ constituents is accessed during its interpretation. However, a compositional analysis does not suffice to properly interpret the meaning of idioms. Moreover, it cannot also be ignored that idiomatic expressions are highly learned sequences which speakers perceive as holistic units.

Regarding the latter, there are several proposals that aim to account for the unitary nature of idioms, which recognize that its idiomatic meaning is derived from a compositional analysis. Each of these proposals is framed within a hybrid approach which solves this peculiar duality of idioms in a different way. For instance, some authors propose that idioms processing is mediated by the degree of compositionality of the expression. Consequently, a literal analysis contributes in a differential way to the interpretation of compositional and non-compositional expressions (Titone & Connine, 1999). Other authors place the notion of structural frames, and assume that either idioms have their own lexical-conceptual node (Cutting & Bock, 1997) or that are represented by a superlemma that contains the syntactic specifications of an idiom, which is connected to the word configuration of the expression (i.e., simple lemmas) (Sprenger et al., 2006). According to Sprenger et al. (2006) the superlemmas are located between concepts and word forms. Finally, there is a proposal based on Kintsch’s connectionist model of language comprehension (i.e., Construction-Integration model 1988, 1998). The Construction-Integration model states that mental representations of both compositional and non-compositional idioms differ only at the propositional level, and that this difference relies in the amount of connections established between the idiomatic meaning node and the propositional node (Caillies & Butcher, 2007). The Construction-Integration model (Kintsch, 1988, 1998) assumes that, in networks with greater interconnections, the integration phase is carried out earlier. From this view it can be inferred that the figurative meaning of compositional idioms is activated before that of non-compositional idioms.

As can be seen from this brief review of idiom’s processing theories, many variables may be underlying the mechanisms of idiomatic expressions. Indeed, it is important to consider that idioms are not homogeneous expressions (Papagno & Genoni, 2004), and thus, present a high variability with respect to their linguistic features, such as Familiarity, Compositionality, Syntactic fixation, Transparency, and Ambiguity (Oliveri et al., 2004). First, ‘familiarity’ is an important linguistic feature to consider since several studies have determined its influence on idioms processing (Lancker & Kempler, 1987; Schweigert, 1986; Schweigert & Moates, 1988). Traditionally, the familiarity of an idiom has been defined as the frequency with which a listener o reader has been exposed to a given expression. A study conducted by Schweigert (1986) revealed that sentences containing highly familiar idioms are read faster than those containing low familiarity idioms. Similar findings were shown by studies conducted by Cronk and Schweigert (1992), which reported that highly familiar idioms are also read faster in figurative contexts. Later, the ‘Compositionality’ of an idiomatic expression is determined by the degree to which the meaning of its components contributes to its overall interpretation (Numberg, Sag & Wasow, 1994). Hamblin and Gibbs (1999) assert that idiomatic expressions move along a continuum between those considered non-decomposable and highly decomposable. For instance, the expression ‘kick the bucket’ is considered a non-decomposable idiom, since there is no correspondence between their components and its figurative meaning (i.e., “to die”). In contrast, the expression ‘pop the question’ is considered highly decomposable, since its figurative interpretation (i.e., “to propose marriage”) is distributed over its two components: ‘pop’ and ‘question’ (Tabossi et al., 2010). As mentioned above, there are proposals suggesting that idiom’s processing is mediated by their degree of compositionality, and that the interpretation of decomposable and non-decomposable idioms involves different cognitive mechanisms. The processing of the former involves lexical retrieval mechanisms and syntactic analysis, whereas the comprehension of the latter involves similar mechanisms to those activated during individual words processing (Titone & Connine, 1999; Tabossi et al., 2010). According to Numberg et al. (1994) the degree of compositionality of an idiom determines its syntactic behavior. Traditionally, idiomatic expressions have been considered as fixed units (Casares, 1992; Ruiz Gurillo, 2001). Tristá (1988) points out that the syntactic fixation of an idiom is manifested in different ways, and states that syntactic fixation it is determined by the impossibility of: a) changing the order of their components; b) introducing elements to the expression; c) substituting some elements for others and d) modifying a certain grammatical category in its number or gender, among other aspects. Despite this, some authors (Gibbs & Gonzales, 1985; Gibbs & Nayak, 1989; Numberg et al., 1994; Horn, 2003) argue that syntactic fixation of an idiom is also a matter of degree, since there are idioms that admit variations in their syntactic structure without losing their figurative meaning. Following this assumption, Fraser (1970) elaborates a hierarchy of idiom’s syntactic flexibility by virtue of the type of transformations that a particular expression can accept. He proposed a 7-levels scale (i.e., 0 to 6), where level 6 includes those idioms that allow any of the abovementioned operations (i.e., syntactic flexible idioms), while level 0 contemplates idiomatic expressions that do not admit the application of any operation on their syntactic structure (i.e., syntactic fixed idioms). This view assumes that any theory of idiom’s processing should account for its syntactic behavior. However, and despite the existence of some theories on this regard (Sprenger et al., 2006), there is no consensus regarding syntactic flexibility (Tabossi et al., 2010). As previously stated, idioms may also vary in terms of their transparency/opacity. Then, ‘transparency’ is defined as the ease by which the motivation for the structure of an idiomatic expression can be recovered (Numberg et al., 1994). Or in other words, how easy it is to recognize the lexicalisation process of an idiom (Papagno & Genoni, 2004). For example, in the expression ‘breaking the ice’, the word ice is clearly identified with a tense emotional environment -by virtue of a cognitive metaphor (Lakoff & Johnson, 1981), where the temperature serves to indicate an emotional state- and breaking refers to dissolving such a situation. In the previous example there is a clear association between literal and idiomatic meaning, which makes the expression highly transparent. In contrast, idioms whose relationship between literal and figurative meaning is not so evident are considered to be opaque. This would be the case of idioms such as ‘break a leg’ or ‘kick the bucket’. Using an acceptability judgment task, Burt (1992) observed that the recognition of transparent idioms was faster than those considered to be opaque. The difference in reaction times to ‘understand the meaning’ of transparent or opaque idiomatic expressions was interpreted as evidence explaining that idiom’s processing is sensitive to the complexity of the relationship between the literal meaning of its constituents and the figurative meaning of the expression (Burt, 1992). Finally, ‘ambiguity’ is associated with the number of possible interpretations of an idiomatic expression (Papagno & Cacciari, 2003; Bonin, Méot & Bugaiska, 2013; Bonin, Méot, Boucheix & Bugaiska, 2017). Therefore, an idiom is described as ambiguous when it presents both a plausible literal and figurative interpretation. This would be the case with expressions such as “to lift the elbow” that can be used both in their idiomatic (i.e., to describe a person who has drink alcohol in excess. (Rassiga, Lucchelli, Crippa & Papagno, 2009), and literal form and in their structure have no identifying element to distinguish them (Tristá, 1988). On the contrary, expressions such as ‘go bananas’ or ‘shoot the breeze’ are considered non-ambiguous since its literal interpretation is not possible since the former is syntactically anomalous (i.e., the verb “go” is intransitive and traditionally cannot take a direct complement), and the latter presents a semantic violation (i.e., breeze is not the kind of object that can normally be an argument of the verb “to shot”) (Tabossi et al., 2010). However, it is still a matter of debate whether ambiguous and non-ambiguous idioms are comprehended through the same processes or not (Citron, Cacciari, Kucharski, Beck, Conrad & Jacobs, 2016). A central question about the role of ambiguity is whether the literal meaning is accessed during idioms’ processing, and whether and when inhibition mechanisms are necessary to deactivate this meaning. According to Tabossi et al. (2010), the evidence on this matter has shown to be inconsistent, and thus, further research is needed. The implementation of studies aiming at the understanding idioms processing should consider the mentioned dimensions, and also to have a list of idiomatic expressions that can be typified in these terms. Despite the latter, there are few normative studies of idioms in languages other than English (Libben & Titone, 2008). After conducting an extensive search, we only found normative studies of idioms for Italian (Tabossi et al., 2010), German (Citron et al., 2016), French (Bonin et al., 2013; 2017), Bulgarian (Nordmann & Jambazova, 2017), and Chinese (Li, Zhang & Wang, 2016). In this context, the present study was performed to obtain both general, and age-group descriptive norms for 1.082 Chilean-Spanish idiomatic expressions. The age-group descriptive norms allowed us to observe, and also to hypothesize a potential effect of age on the rating of idiomatic expressions. Therefore, we included additional analyses which may bring light to further research on this subject matter.

2. Methods

2.1. Participants

The sample encompassed 622 native speakers of Chilean-Spanish from the city of Viña del Mar (Female N=425, Mean age=49.24, SD=18.06; Male N=197, Mean age= 47.32, SD=17.41), recruited through an open call made by the researchers of the Centro de Investigación del Desarrollo en Cognición y Lenguaje (CIDCL) of the Universidad de Valparaíso, Chile. The total sample was later divided in three groups in order to obtain age descriptive norms for the idiomatic expressions (Young N=211, Mean age=27.88, SD=5.52, age range= 20-39 years old; Adults N=203, Mean age=49.33, SD=5.83, age range=40-59 years old; Elderly N=208, Mean age=69.00, SD=6.07, age range=60-80+ years old). Participants had to meet a single inclusion criterion, which was to have twelve or more years of formal education. Exclusion criteria were a) to present any diagnosed cognitive impairment or disorder (i.e., mild or major neurocognitive disorder), b) non corrected visual or auditory problems that result in an impediment to complete the tasks. Consequently, ten participants had to be excluded from the sample. Enrollment was voluntary, and all participants previously gave their written informed consent. All procedures were approved by the ethics committee of the Universidad de Valparaíso, and were implemented in compliance with the Helsinki declaration of ethical principles for research involving human participants.

2.2. Materials

2.2.1. List of Chilean-Spanish Idiomatic Expressions

The list was developed by selecting 1.082 idiomatic expressions from the “Nuevo Diccionario Ejemplificado de Chilenismos y de Otros Usos Diferenciales del Español de Chile” (Morales, 2006). We selected only verbal idiomatic expressions to control any potential difference that might emerge as a consequence of the underlying cognitive processing inherent to the different kinds of idioms, such as, nominal, adjectival, adverbial, among others (Casares, 1992). This decision was made to obtain a homogeneous list of idiomatic expressions. The list of idiomatic expressions was implemented through E-Prime software, version 2.0 Professional (Tools, 2007).

2.3. Procedure

Following Bulkes and Tanner’s (2017) procedure, the 1,082 idiomatic expressions were randomly subdivided into five lists with the purpose of avoiding that every participant had to rate the extensive list of idioms. In addition, the complete list of Chilean-Spanish Idiomatic Expressions was rated in terms of four linguistic dimensions: a) Familiarity, b) Ambiguity, c) Compositionality, and d) Transparency. Thus, each participant rated the idiomatic expressions of only one of the five lists, and from a single dimension. For instance, whereas participant 1 rated the familiarity of the idioms of list 1, participant 2 rated the transparency of the idioms of list 4, and so on. The number of idiomatic expressions ranged from 216 to 219 per list and dimension (list/dimension). It is important to highlight that for composicionality and transparency dimentions the total number of idioms expressions increased as some expressions have more than one figurative meaning.

Familiarity was operationalized as the frequency in which a single speaker faces a particular idiomatic expression and was rated in a five-point likert scale ranging from 1=unfamiliar to 5= very familiar. Ambiguity was operationalized as the degree in which a single idiomatic expression could have a literal interpretation and was also rated in a five-point likert scale ranging from 1=not possible to 5=totally possible. Compositionality was operationalized as the degree in which the literal meaning of a single component of a particular idiom contributes to its figurative meaning and was also rated in a five-point likert scale ranging from 1=not decomposable to 5=fully possible. Finally, Transparency was operationalized as the “ease with which the motivation for their structure can be recovered” (Numberg et al., 1994), and was also rated in a five-point likert scale ranging from 1=not transparent to 5=fully transparent. The methodological choice of using a five-point likert scale was based on previous studies describing norms for idiomatic expressions (Bonin et al., 2017, 2013; Bulkes & Tanner, 2017). Each list on every dimension was rated by a minimum of 30 participants of each age group (i.e., Young, Adult, Elderly), ranging from 30 to 35. Subsequently, each dimension was rated by a minimum of 150 unique participants. The whole rating process was conducted either collectively or individually depending on the availability of the participants. Researchers supervised the rating process in order to provide the rating instructions, clarify doubts or solve any possible technical issue.

2.4. Data Analysis

Descriptive analyses were conducted to summarize the ratings given to the Chilean-Spanish idiomatic expressions for each linguistic dimension, for both the total sample and for each age group. A reliability analysis was performed to obtain a measure of internal consistency (Cronbach’s alpha.) of each idiom´s list and linguistic dimensions. In addition an interrater reliability analysis was performed to evaluate the consistency of the ratings given on each dimension by the participants belonging to each age group. Correlational analyses were performed to observe the association between the different linguistic dimensions. To observe a potential effect of age on participant’s ratings within each psycholinguistic dimension a one-way ANOVA analysis was performed. All analyses were conducted using JASP software (JASP Team, 2019; Version 0.11.1). The normative data is freely available to download at

3. Results

Descriptives.Table 1 shows the descriptive statistics obtained for each dimension of the idiomatic expressions included in this study (i.e., familiarity, ambiguity, compositionality and transparency). The highest mean score is observed in the Ambiguity dimension (3.53) and the lowest in the Familiarity dimension (2.48).

Table 1 Descriptive statistics of idioms dimensions for age group and total sample. 

20-39 years old 40-59 years old 60-83 years old Total sample
Dimension Mean (SD) Mean (SD) Mean (SD) Mean (SD)
Familiarity 2,43 (0,96) 2,46 (0,84) 2,54 (0,85) 2,48 (0,81)
Ambiguity 3,50 (1,23) 3,48 (1,21) 3,61 (1,02) 3,53 (2,96)
Compositionality 2,62 (0,85) 2,93 (0,71) 3,33 (0,70) 2,96 (0,66)
Transparency 2,71 (0,69) 3,31 (0,77) 3,20 (0,79) 3,08 (0,67)

As can be observed in Figure 1, for the compositionality and transparency dimentions the ratings shows a normal distribution. In the case of ambiguity there is an asymmetric distribution with a higher frequency of values representing high ambiguity. For familiarity, a higher frequency of low and medium values can be observed.

Figure 1 Histograms for idioms ratings by dimension for the total sample. 

As shown in Figure 2, the analysis of age-groups behavior shows a tendency for older groups to rank idioms as more decomposable and transparent, and with less dispersion of data. This tendency is not equally observed for the dimensions of ambiguity and familiarity.

Figure 2 Box-plots for idioms mean ratings by dimension for age group and total sample. 

3.1. Reliability analysis

As mentioned before, the Cronbach’s alpha was used to evaluate the internal consistency of each idiom´s list and linguistic dimensions. Like in Bonin et al. (2017) and Citron et al. (2015), the reliability for familiarity, ambiguity and compositionality were high. In fact, as can be seen in Table 2, all the reliability scores were above .96. These results could be explained by the fact that each list was ranked by a minimum of 30 raters.

Table 2 Cronbach's alpha for the different norms by idiom's list. 

List Familiarity Ambiguity Ccompositionality Transparency
1 0.98 00.96 0.99 0.99
2 0.97 00.96 0.99 0.99
3 0.98 00.97 0.99 0.99
4 0.98 00.97 0.99 0.99
5 0.98 00.97 0.99 0.99

3.2. Inter-rater reliability

The Krippendorff’s Alpha Index was used to estimate the reliability of participant’s ratings. We adopted this methodological option because it can be used regardless of the number of observers, and it is suitable for any variable’s measurement level, sample size, and presence or absence of missing data (Hayes & Krippendorff, 2007). Table 3 reports both Krippendorff’s Alpha Interrater reliability index coefficients for the whole sample and age-groups. Results show a low overall interrater reliability (Krippendorff, 2004). Alpha coefficients range between values of .12 and .46., within all groups and dimensions, and Krippendorff (1980) suggest that coefficients below .67 are unacceptable. The highest coefficients were observed in both familiarity and ambiguity dimensions (i.e., 0.38 and 0.34 respectively). The coefficients were also higher for the younger age-group in the mentioned dimensions. In contrast, the lowest coefficients were observed in the dimensions of compositionality and transparency (i.e., .15 and .16, respectively). The oldest age-group showed the lowest coefficients in compositionality, whereas the youngest age-group showed the lowest values in transparency.

Table 3 Krippendorff’s alpha inter-rater reliability for all age groups. 

Age Group Familiarity Ambiguity Compositionality Transparency
20-39 years old 0,46 0,42 0,29 0,13
40-59 years old 0,41 0,37 0,12 0,16
60-83 years old 0,38 0,28 0,09 0,19
Total 0,38 0,34 0,15 0,16

3.3. Correlational Analyses

We performed a Spearman’s correlation coefficient ρ (rho) analysis, since the statistical assumptions to use a parametric test in all psycholinguistic dimensions were not fulfilled. As shown in Figure 3, all correlations were positive and statistically significant (p<0.05). According to Hinktle (2003), small positive correlations were observed between a) familiarity and ambiguity (ρ=.131, p<.001), b) ambiguity and compositionality (ρ=.102, p<.001), and c) ambiguity with transparency (ρ=.070, p=.021). A low positive correlation was observed between familiarity and compositionality (ρ=.467, p<.001). A moderate positive correlation was observed for familiarity with transparency (ρ=.596, p<.001), and a high positive correlation for transparency with compositionality (ρ=.824, p<.001).

Figure 3 Correlation matrix for idioms dimensions. 

3.4. Age-effect on participant’s ratings within each psycholinguistic dimension

Participant’s age ranged from 20 to 83 years old and were distributed in three age-groups (i.e., 20-39; 40-59 and 60-83 years old). Therefore, a one-way ANOVA analysis was conducted to observe a potential effect of aging on participant’s ratings within each psycholinguistic dimension. Since none of the dimensions met the homoscedasticity’s assumption, ANOVA’s results were estimated using Welch’s corrections. As can be seen in Table 4, significant age-group differences (i.e., p < .05) were observed within each psycholinguistic dimension (i.e., familiarity, composition, transparency and ambiguity). However, the effect sizes observed for both familiarity and ambiguity dimensions can be interpreted as small (f < 0.1; Cohen, 1988). The magnitude of the age-group effect for compositionality and transparency can be considered medium (0.25 < f < 0.40), where the older age-group ranked idioms as more compositional and transparent than younger participants did.

Table 4 Age effect on rating for each psycholinguistic dimension (ANOVA). 

Dimension p-valuea Effect Size (ƞ2)
Familiarity < .05* 0.06
Ambiguity < .001** 0.06
Compositionality < .001** 0.39
Transparency < .001** 0.35

a. Welch's correction

* Significant at ⲁ=.05

** Significant at ⲁ=.01

Multiple comparisons through Games-Howell post hoc test were performed between young- adults, young - elderly, and adults - elderly. We observed differences in all comparisons, for both compositionality (p < .001) and transparency (p < .01).

4. Discussion

It has been reported that idiom’s features, such as their familiarity and degree of compositionality among others, have an impact on their processing (Bonin et al., 2013). In this context, obtaining norms or other kind of standards for these psycholinguistic features is very useful to design experimental paradigms which aim is to study how these idiomatic expressions are processed and represented in memory (Bonin et al. 2013).

According to the ‘standards for educational and psychological testing’ (AERA, 2018) standards evolve rapidly, and thus, there is an ongoing need to monitor changes in any field of interest (e.g., idiom’s features). Standards are important in the contexts to which they apply, but do not require the use of specific technical methods (AERA, 2018). Moreover, there may be different methodologies that could be used to gather information to support any norm or standard, providing or setting out issues and/or requirements relevant to almost all testing or testing contexts (AERA, 2018). Hence, the purpose of the present study is to contribute to this research area by establishing descriptive norms for the familiarity, ambiguity, compositionally and transparency dimensions of 1,082 Chilean-Spanish idioms. The latter, would provide a generally accepted reference that allow researchers to develop their experimental paradigms following a normative standard which is based on the descriptive features of Chilean-Spanish idioms.

Regarding the dimension of Familiarity, results showed lower average scores than those reported by other studies (e.g., Tabossi et al., 2011; Bulkes & Tanner, 2016; Li et al., 2016; Nordmann & Jambazova, 2017, Bonin et al., 2013, 2017). In this regard, two issues are discussed. The first has to do with the notion of familiarity, which does not present a single conceptualization, and has implications on how this dimension is operationalized (Nordmann & Jambazova, 2017). When rating the familiarity of idioms, some studies asked participants to estimate how well they thought that the idiomatic expression was known by people like them, independently of whether or not they knew it themselves (Tabossi et al., 2011; Bonin et al., 2013; Bonin et al., 2017). However, in other studies, this dimension was associated with the notions of ‘knowledge’ and ‘subjective frequency’. On the one hand, knowledge is the degree to which a person thinks to know the overall meaning of an idiom, and thus, can verbally explain it (Li et al., 2016). On the other hand, the ‘subjective frequency’ refers to how often users can face a particular idiomatic expression in everyday life. As noted above, this study has conceptualized the idiom’s familiarity as their subjective frequency. The second issue to be discussed here is that all studies use likert scales to rate idiom’s familiarity. However, the values and criteria that constitute these scales are dissimilar among studies. In this study we used a 5-point Likert, following Bonin et al. (2013), Bonin et al. (2017) and Bulkes and Tanner (2016) procedures. The observed average ratings for ambiguity were similar to those reported by Bulkes and Tanner (2016) but larger than those obtained by Bonin et al. (2013, 2017), even though both studies used 5-point Likert scales. Results allow us to state that idiomatic expressions were interpreted, and subsequently rated, as being ambiguous to the general sample. Indeed, the frequency analysis showed that most of ratings are closer to the higher end of the Likert scale, which represent a totally plausible literal meaning. Following Bulkes and Tanner (2016), these results can be absolutely expected since, only with few exceptions, idioms lose their literal meaning over time. Results regarding compositionality are very similar to those reported by Bonin et al. (2013) and Bonin et al. (2017), who also used a 5-point scale to rate this dimension. Compositionality is one of the most typical features of idiomatic expressions, which were traditionally assumed to be lexicalised entities. However, our data does not support this intuitive assertion, but rather supports Gibbs and Hamblin’s (1999) perspective assuming that idioms are entities differing in the degree to which the meaning of its parts contribute to its interpretation, and thus, can be classified in a continuum between decomposable and non-decomposable. Regarding transparency, our results reveal a tendency to rate idioms as transparent. A similar tendency was reported by Citron et al. (2015), although they used a 7-point scale. Despite this, it is important to mention that few studies consider semantic transparency as a variable to characterize idiomatic expressions. According to Citron et al. (2015), transparency may be considered as a problematic and unstable variable since listeners/speakers’ ratings are based on intuitions, which are mostly derived from the knowledge of the idiomatic meaning.

The interrater reliability analysis showed low rating’s consistency within all linguistic dimensions, which is consistent with the results obtained by Normann, Cleland and Bull (2014), and Normann and Jambazova (2017). Interrater reliability is particularly low for compositionality and transparency. Accordingly, Nordmann et al. (2014) affirm that compositionality is an abstract linguistic feature. The latter allows for interpreting this feature in different ways by each participant, assigning different semantic weights to the individual components of the speech. The latter can also find an explanation on the observed association between the compositionality and transparency. As previously reported, all correlations were positive and statistically significant. However, two particular associations can be distinguished, the first between familiarity and transparency, and the second between transparency and compositionality. Regarding the first one, our results are supported by those studies reporting that familiar idioms are perceived as more transparent compared to unfamiliar ones (Abel, 2003). However, they are inconsistent with those reported by Citron et al. (2015), and Tabossi et al. (2011), who found no association between transparency and familiarity. Following Citron et al (2015), these inconsistent findings may be attributed to the variability of ratings given by each study’s participants to this dimension. Regarding transparency and compositionality, Nordmann and Jambazov (2017) reported similar findings, stating that these dimensions might be considered as synonymous concepts.

The present study understands compositionality as the degree to which the idiom’s components contribute to its idiomatic interpretation and transparency as the ease to identify the linguistic process for a given speaker/listener/reader, by which a free syntagma has been crystallized or lexicalised, and therefore, acquiring an idiomatic meaning (Papagno & Genoni, 2004). In this sense, it is expected that highly decomposable idioms will be perceived as more transparent and, on the contrary, non-decomposable idioms will be considered as opaquer. Our results are in line with those reported by Nordmann and Jambazov (2017), showing a strong and direct association between these two dimensions.

Additionally, we studied a potential effect of age on ratings for familiarity, ambiguity, compositionality and transparency. Interestingly, we found that older participants (i.e., elderly group) tend to rank idioms as more compositional and transparent than younger participants (i.e., both young and adults’ groups). The effect size for these differences can be interpreted as moderate for familiarity and ambiguity (ƞ2=0.06). However, regarding compositionality and transparency the observed effect sizes were ƞ2=0.39 and ƞ2=0.35, respectively. Both differences between young and adults’ groups can be considered as very large according Cohen’s (1988) criteria for effect sizes, where an effect size of ƞ2=0.14 is considered as large. Regarding transparency, a potential explanation to this phenomenon may be related to cultural background or prior knowledge about idiom’s origins that allow older participants to better recognize the processes of idioms lexicalization in comparison to younger participants (Sprenger, la Roi & van Rij, 2019). However, to our knowledge, there are no available theoretical support that may allow us to interpret this finding. Therefore, this aspect could be further investigated in future studies since brings out some insights regarding the effects of age differences that would be considered as variables when studying idioms comprehension.

Normann and Jambazova (2017) recently reported an age effect on idioms’ rating, however this effect was restricted to variables such as familiarity, literality (or ambiguity), and meaning (or knowledge). Norman and Jambazova (2017) observed that older participants were more likely to rate the idioms as more familiar, more meaningful, and also more literal than younger participants. Nevertheless, the mentioned authors did not consider age as a relevant predictor for compositionality classification. Despite the latter, it seems important to highlight that Normann and Jambazova (2017) hypothesized that age differences may explain the low interrater reliability observed in their research, something that may be also extrapolated to our results. According to the authors, this hypothesis would be supported by previous research showing an age-effect idioms processing.


The main aim of this study was to provide norms for psycholinguistic dimensions of Chilean-Spanish idioms. To our knowledge, this is the first study that provides a description of familiarity, ambiguity, compositionality and transparency of Spanish idiomatic expressions. Interrater reliability was also estimated for these four psycholinguistic variables. The results showed low interrater reliability, which might be accounted for by the observed age differences between groups in this study.

A second aim of this study was to explore the association between psycholinguistic variables. Our findings showed significant associations between dimensions. Of particular interest are the correlations observed between familiarity and transparency, and also between compositionality and transparency, since its relation was only theoretically proposed but not empirically observed.

Finally, as an additional aim, we observed for a potential age-effect on participant’s ratings for familiarity, ambiguity, composition and transparency. We found that age has a significant effect on rating for all dimensions, but particularly important for compositionality and transparency. Consequently, we recommended to consider age-differences in future studies. We expect that or results can be useful for further studies on representation and processing of idioms, in which a rigorous control of these psycholinguistic variables is required.


