Semantic relations between collocations: A Spanish case study

: Linguistics as a scientific study of human language intends to describe and explain it. However, validity of a linguistic theory is difficult to prove due to volatile nature of language as a human convention and impossibility to cover all real-life linguistic data. In spite of these problems, computational techniques and modeling can provide evidence to verify or falsify linguistic theories. As a case study, we conducted a series of computer experiments on a corpus of Spanish verb-noun collocations using machine learning methods, in order to test a linguistic point that collocations in the language do not form an unstructured collection but are language items related via what we call collocational isomorphism, represented by lexical functions of the Meaning-Text Theory. Our experiments allowed us to verify this linguistic statement. Moreover, they suggested that semantic considerations are more important in the definition of the notion of collocation than statistical ones.


INTRODUCTION
Computer experiments play a very important role in science today.Simulations on computers have not only increased the demand for accuracy of scientific models, but have helped the researcher to study regions which can not be accessed in experiments or would demand very costly experiments.
In this article, we present computational experiments made on the material of Spanish verbnoun collocations like seguir el ejemplo, follow the example, satisfacer la demanda, meet the demand, tomar una decisión, make a decision.The purpose of the experiments is to test a linguistic statement concerning collocational semantics.In Section 1, we present this linguistic statement in detail, in Section 2, describe the experiments as to the language data and methods used, discuss the experimental results in the light of the linguistic statement made before in Section 3, and in Section 4, derive another important inference on the nature of collocation.
It should be added here that testing a linguistic hypothesis on computer models not only demonstrates validity or rejection of the hypothesis, but also motivates the researcher to search for more profound explanations or to explore new approaches in order to improve computational operation.Thus, starting from one linguistic model, the researcher can evaluate it and then go further, sometimes into neighboring spheres of linguistic reality, in her quest of new solutions, arriving at interesting conclusions.The original intent of our research was to test one linguistic model experimentally.The obtained results produced evidence for verifying this model, but they also made it possible to get more insight into the nature of collocation which has been a controversial issue in linguistics for many years.

Case study
Before formulating the linguistic point we are going to test via computer experiments, we will first localize it within the vast realm of linguistics.Our statement is concerned with the concept of collocation, one of contemporary controversial issues in theoretical and applied linguistics.Knowledge of collocation is very important in lexicology (Herbst & Mittmann, 2008), translation (Boonyasaquan, 2006), language acquisition (Handl, 2008), and in various tasks of automated processing of natural language (e.g., in automatic word sense disambiguation: Jin, Sun, Wu & Yu, 2007; in machine translation: Wehrli, Seretan, Nerima & Russo, 2009; in text classification: Williams, 2002, etc.).

Concept of collocation
Since the linguistic point we have dealt with speaks of collocations, we will first give a definition of collocation.Many definitions have been proposed by linguists, the first one was given in (Firth, 1957) where the author sees collocations of a given word as statements of the habitual or customary places of that word.Here are a few examples of collocations in Spanish taken from (Bolshakov & Miranda-Jiménez, 2004): prestar atención, give attention, presidente del país, president of the country, país grande, large country, muy bien, very well.In a collocation, one word dominates over the other and determines its choice (Hausmann, 1984).The dominant word is called the 'base', and the other word whose choice is not free but depends on the base is called the 'collocate'.Thus, in the collocation prestar atención, the base is atención and the collocate is prestar; in país grande, the base is país and the collocate is grande.Semantically, the base is used in its typical meaning while the collocate accepts another meaning, not typical for it but determined by the base.Now we are going to present our linguistic statement.The terms 'collocational isomorphism' and 'lexical function' are explained in the sections that follow.

Hypothesis
Collocations are not a stock or a 'bag' of word combinations, where each combination exists as a separate unit with no connection to the others, but they are related via collocational isomorphism represented as lexical functions.

Collocational isomorphism
Considering collocations of a given natural language (this work was fulfilled on Spanish verb-noun collocations), it can be observed that collocations are not just a 'bag' of word combinations, as a collection of unrelated items where no association could be found, but there are lexical relations among collocations, and in particular, we study the lexical relation which may be called 'collocational isomorphism'.It has some resemblance to synonymy among words which is the relation of semantic identity or similarity.Collocational isomorphism is not a complete equality of the meaning of two or more collocations, but rather a semantic and structural similarity between collocations.
What do we mean by semantic and structural similarity between collocations?For convenience of explanation, we will comment on the structural similarity of collocations first.The latter is not a novelty, and a detailed structural classification of collocations (for English) was elaborated and used to store collocational material in the wellknown dictionary of word combinations The BBI Combinatory Dictionary of English (Benson, Benson & Ilson, 1997).However, we will exemplify collocational structures with Spanish data, listing some typical collocates of the noun alegría, joy: verb + noun: sentir alegría, to feel joy adjective + noun: gran alegría, great joy preposition + noun: con alegría, with joy noun + preposition: la alegría de (esa muchacha), the joy of (this girl).
The above examples are borrowed from the dictionary of Spanish collocations entitled Diccionario de colocaciones del Español (Alonso Ramos, 2003), a collection of collocations in which the bases are nouns belonging to the semantic field of emotions.So collocations have structural similarity when they share a common syntactic structure.
We say that two or more collocations are similar semantically if they possess a common semantic content.In Table 1, we present collocations with the same syntactic structure, namely, 'verb + noun'.For these collocations, the meaning is given for us to see what semantic element can be found that is common to all of them.
It may be noted that the meaning of all collocations in Table 1 is generalized as 'do, carry out or realize what is denoted by the noun', in other words, that these collocations are built according to the semantic pattern 'do the noun'.In turn, observing the meaning of the nouns, we see that their semantics can be expressed in general terms as 'action ' (uso, abrazo, medida) or 'psychological attribute' (atención, interés), so the resulting semantic pattern of the collocations in Table 1 is 'do an action / manifest a psychological attribute'.Since these collocations share common semantics and structure, we may say that they are  isomorphic, or that they are tied to one another by the relation we termed above as 'collocational isomorphism'.Table 2 gives more examples of isomorphic collocations.

Collocational isomorphism represented as lexical functions
Several attempts to conceptualize and formalize semantic similarity of collocations have been made.As far back as in 1934, the German linguist Porzig (1934) claimed that on the syntagmatic level, the choice of words is governed not only by grammatical rules, but by lexical compatibility, and observed semantic similarity between such word pairs as dog -bark, hand -grasp, food -eat, cloths -wear.The common semantic content in these pairs is 'typical action of an object'.Research of Firth (1957) drew linguists' attention to the issue of collocation and since then collocational relation has been studied systematically.In the article of Flavell and Flavell (1959) and in the paper by Weinreich (1969), there were identified the following meanings underlying collocational isomorphism: an object and its typical attribute (lemon -sour), an action and its performer (dog -bark), an action and its object (floor -clean), an action and its instrument (axe -chop), an action and its location (sit -chair, lie -bed), an action and its causation (have -give, see -show), etc. Examples from the above mentioned writings of Porzig (1934), Flavell and Flavell (1959), Weinreich (1969) are borrowed from (Apresjan, 1995).The next step in developing a formalism representing semantic relations between the base and the collocate as well as semantic and structural similarity between collocations was done by Mel'čuk.Up to now, his endeavor has remained the most fundamental and theoretically well-grounded attempt to systematize collocational knowledge.This scholar proposed a linguistic theory called the Meaning-Text Theory, which explained how meaning, or semantic representation, is encoded and transformed into spoken or written texts (Mel'čuk, 1974).His theory postulates that collocations are produced by a mechanism called lexical function.Lexical function is a mapping from the base to the collocate; it is a semantically marked correspondence that governs the choice of the collocate for a particular base.The following definition of lexical function is given in (Mel'čuk, 1996: 40): "The term function is used in the mathematical sense: f(X) = Y.…Formally, a Lexical Function f is a function that associates with a given lexical expression L, which is the argument, or keyword, of f, a set {L i } of lexical expressions -the value of f -that express, contingent on L, a specific meaning associated with f: f(L) = {L i }.Substantively, a Lexical Function is, roughly speaking, a special meaning (or semanticosyntactic role) such that its expression is not independent (in contrast to all "normal" meanings), but depends on the lexical unit to which this meaning applies.The core idea of Lexical Functions is thus lexically bound lexical expression of some meanings."About 70 lexical functions have been identified in (Mel'čuk, 1996); each is associated with a particular meaning according to which it receives its name.The name of a lexical function is an abbreviated Latin word whose semantic content is closest to the meanings of this lexical function.Using the above notation, the collocation dar un paseo, lit.give a walk, is represented as Oper 1 (paseo) = dar where 'Oper' is from Latin operari (do, carry out); the argument, or the keyword of this lexical function is paseo; its value is dar; the subscript 1 stores information concerning the syntactical structure of utterances where the keyword of Oper 1 (paseo) is used together with its value (dar) and where the first argument of paseo (Agent) is lexicalized in speech as the grammatical subject: Mi abuela (Agent) da un paseo por este parque cada sábado, My grandma takes a walk in this park every Saturday.Other collocations that are isomorphic to dar un paseo can be represented likewise, and, in fact, they are the collocations we put in Table 1: hacer uso is represented as Oper 1 (uso) = hacer, dar un abrazo, as Oper 1 (abrazo) = dar, prestar atención, as Oper 1 (atención) = prestar, etc.Another example of a lexical function is Func 0 , from Lat. functionare, function.The keyword of Func 0 can be an action, activity, state, property, relation, the value of Func 0 has the meaning 'happen, take place, realize itself', and the subscript 0 implies that the keyword functions as the grammatical subject in utterances: Func 1 (viento) = soplar (el viento sopla, the wind blows), Func 1 (silencio) = reinar (el silencio reina, lit. the silence reigns), Func 1 (accidente) = ocurrir (el accidente ocurre, the accident happens).The lexical function 'Real n ' (n = 0, 1, 2...), from Lat. realis, real, means 'to fulfill the requirement of the keyword', 'to do with the keyword what you are supposed to with it', or 'the keyword fulfils its requirement'.In particular, Real 1 has the meaning 'use the keyword according to its destination', 'do with regard to the keyword that which is normally expected of its first participant': conceder amistad a alguien, to strike up a friendship with somebody, dar cariño, to give a cuddle, consumirse en los celos, to be consumed by jealousy.Real 2 means 'do with regard to X that which is normally expected of second participant': recibir cariño de alguien, to share a cuddle with somebody, aprobar el examen, to pass the exam, vengar la ofensa, to take revenge for the offense.
There are cases in which a given lexical function represents one elementary meaning, as Oper, Func, Real, for which we have explained their meanings and listed examples.More functions representing elementary meanings have been discovered: Labor (Lat.laborare, to work, toil), Incep (Lat.incipere, to begin), Cont (Lat.continuare, to continue), Fin (Lat.finire, to cease), Caus (Lat.causare, to cause), Perm (Lat.permittere, to permit), Liqu (Lat.liquidare, to liquidate), etc.But there are still more cases when the verb's semantic content in verb-noun collocations is complex and includes several elementary meanings.For example, consider the semantics of 'begin to realize an action or begin to manifest an attribute' from Table 2, which consists of two elements: 'begin' and 'realize / manifest'.To represent such compound meanings, complex lexical functions are used, those being combinations of elementary lexical functions, termed 'simple lexical functions'.All lexical functions exemplified above are simple.
It can be noted that the collocational semantics in Table 2 are complex lexical functions.Table 3 presents the meanings from Table 2 through the instrumentality of the lexical function formalism.The meanings are accompanied by sample collocations taken from Table 2 as well.The notation includes the names of lexical functions and the syntactic information concerning grammatical functions of lexicalized semantic roles encoded in subscripts.For the sake of preserving the complete notation, we use the name of the function and the subscripts, but since we are interested in the semantic aspect of collocations, here we leave the subscripts unexplained.However, a detailed description of subscripts and their meanings can be obtained from (Mel'čuk, 1996).Complex lexical functions in Table 3 include the following simple lexical functions as their constituents: Caus, Func, Plus, Minus, Incep, Cont, Oper.All of them, except for Plus and Minus, were introduced earlier in this section.Plus (more) and Minus (less) are self-explanatory.
As we have mentioned, about 70 lexical functions were distinguished in (Mel'čuk, 1996).The only other existing typology based on semantic and syntactic features includes only 15 different types of collocations (Benson et al., 1997).Therefore, lexical functions can serve as a more detailed representation of collocational isomorphism.Now we are going to see if computer experiments can supply evidence to the existence of collocational isomorphism as defined by lexical functions.The idea is to submit a list of collocations to the computer and see if it is able to distinguish collocations belonging to different lexical functions.If a machine can recognize lexical functions, then it is a strong testimony to their existence.

Outline of the experimental procedure
This section gives an overview of the experiments, and how the data analysis was accomplished.Basically, the experiments consist in asking the computer if a given collocation belongs to a particular lexical function or not.For example, the computer has to decide if iniciar la sesión from Table 3 is IncepOper 1 or not.The computer's decision is made after the data set analysis.For the experiments, eight lexical functions were chosen.This choice was made on the basis of data available to us.This is further explained in Section 2.2.Algorithm: constructing data sets Input: a list of 900 Spanish verb-noun collocations annotated with 8 lexical functions Output: 8 data sets -one for each lexical function For each lexical function Create an empty data set and assign it the name of the lexical function.

For each collocation in the list of verb-noun collocations
Retrieve all hyperonyms of the noun.Retrieve all hyperonyms of the verb.Make a set of hyperonyms: {noun, all hyperonyms of the noun, verb, all hyperonyms of the verb}.If a given collocation belongs to this lexical function assign '1' to the set of hyperonyms, Else assign '0' to the set of hyperonyms.Add the set of hyperonyms to the data set.Return the data set.For each of eight lexical functions, a data set is compiled according to the algorithm presented in Figure 1.The input to this algorithm is a list of 900 Spanish verb-noun collocations annotated with eight lexical functions.The output is eight data sets, one for each of the selected lexical functions.The data sets include all hyperonyms 1 of each verb and all hyperonyms of each noun in the collocations.The hyperonyms are retrieved from the Spanish WordNet, an electronic dictionary.More details about the list of collocations and the data sets are given in Section 2.2.
The next stage of the experimental procedure is to submit the data sets to machine learning techniques which construct models for making decisions.How the data is analyzed and how the model is build is specific to every machine learning method.The models are evaluated so that one can see whether a method is precise enough on detecting lexical functions.The evaluation is done in terms of F-measure.The methodology is further explained in Section 2.3, and the experimental results are given in Section 2.5.

Data
In this section, more explanation and details are given as to what data was used and how data sets for machine learning experiments were compiled.Experiments were fulfilled on the material of Spanish verb-noun collocations like formar un grupo, form a group, dar una conferencia, give a lecture, destacar la importancia, emphasize the importance, presentar información, present information.900 verb-noun collocations were extracted from the Spanish Web Corpus automatically using the Sketch Engine, software for automatic text processing (Kilgarriff, Rychly, Smrz & Tugwell, 2004).The Spanish Web Corpus 2 contains 116 900 060 tokens 3 and is compiled of texts found in the Internet.The texts are not limited to a particular topic but touch on any theme which can be discussed on the World Wide Web.We extracted collocations which are most frequently met in the Spanish Web Corpus; therefore, these are most common collocations in contemporary Spanish Internet communication.
The list of 900 verb-noun collocations was manually annotated with lexical functions 4 .Some verb-noun pairs in the list did not have collocational nature, so they were tagged as 'free word combinations'.For example, poner un ejemplo, give an example (CausFunc 0 ), tener un efecto, have an effect, (Oper 1 ), dar un salto, make a leap (Oper 1 ) are collocations characterized by lexical functions and dar una cosa, tener casa, dar la mano are free word combinations.Among 900 most frequent verb-noun collocations, there were 261 free word combinations and 639 collocations belonging to 36 lexical functions.
As it was said in Section 1.4, the overall number of lexical functions that have been identified is 70.This number includes lexical functions found in collocations of various structures: noun-noun, adjective-noun, verb-noun, verb-adverb, etc.In this work, we study only Spanish verb-noun collocations, and we were interested in lexical functions encountered in most frequent of them.We have found out that the list of 900 verb-noun collocations described above contains 36 lexical functions.However, only eight lexical functions of these 36 have the number of collocations sufficient for computer experiments, so they were selected for machine learning experiments.The chosen lexical functions are shown in Table 4, and the number of collocations for each lexical function is presented in Table 5.
The next step in data preparation was to find out in what sense words were used in collocations.So every noun and every verb in the list was disambiguated manually with word senses of the Spanish WordNet (Vossen, 1998).Word senses in this dictionary are designated by numbers and represented by synsets, or synonym sets, consisting of words synonymous with each other and naming one concept.A synset may be accompanied by a brief definition, or 'gloss'.Below we give all senses for the word broma, joke, found in the Spanish WordNet; each sense has its number, synset and gloss, words in synsets are written in the form 'word_number of the sense': Sense 1: broma_1 jocosidad_1 chanza_1 ocurrencia_1 gracia_1 chiste_1 a humorous anecdote or remark Sense 2: broma_2 vacilada_1 burla_4 a ludicrous or grotesque act done for fun and amusement Sense 3: broma_3 jocosidad_3 activity characterized by good humor Sense 4: broma_4 teredo_1 typical shipworm After word sense disambiguation was accomplished, we extracted hyperonyms for all words in collocations.Hyperonyms were taken from the same dictionary, i.e., the Spanish WordNet.The purpose was to represent the meaning of each verbnoun collocation by all hyperonyms of the verb and all hyperonyms of the noun.As an example, let us social_1, relación_4, abstracción_6} and is supplied to the computer.All 900 collocations are represented likewise and become input data for machine learning techniques.We believe that hyperonym sets have the power of distinguishing collocations belonging to different lexical functions.Therefore, hyperonyms can be a sufficient semantic description of collocations for our purpose.

Methodology
The task of the computer is to look through collocations of a given lexical function, for example, Oper 1 , marked as the class 'yes' according to the take hacer una broma, make a joke.We clarify this collocation and get hacer_15 broma_1.In Fig. 2 we list all hyperonyms of hacer_15 and broma_1.The words of the collocation, i.e. hacer_15 and broma_1, are considered hyperonyms of themselves, or zerolevel hyperonyms, and are included in the hyperonym set.
Thus the meaning of hacer una broma is represented as the hyperonym set {hacer_15, efectuar_1 realizar_6 llevar_a_cabo_5 hacer_15, actuar_2 llevar_a_cabo_3 hacer_8, broma_1 jocosidad_1 chanza_1 ocurrencia_1 gracia_1 chiste_1, humorada_1 jocosidad_2, contenido_2 mensaje_1, comunicación_2, relación_ algorithm in Figure 1, compare them to the rest of the input data we prepared, i.e., to the collocations of all other lexical functions and free verb-noun combinations, marked as the class 'no', and to identify what features are characteristic for collocations of Oper 1 (in our data, the features are hyperonyms).In other words, the computer must find what features distinguish Oper 1 from other lexical functions.This knowledge will be used later, when the computer is given a list of collocations whose lexical functions are unknown to it.Then the computer's task is to examine these collocations and determine which of them belong to Oper 1 .
A computational technique used for tasks similar to the one we have just described, is called machine learning.In fact, machine learning is a class of methods developed in the area of artificial intelligence.These methods are based on various mathematical or statistical models and applied to extract knowledge from data: find data patterns, build structural description of data items, classify these items.The field of machine learning has its own concepts and terminology, and we are going to introduce some of the terms in the course of this section to make our exposition more exact.Pieces of data examined by machine learning techniques are called instances, or examples.In our case, an example is a set of all hyperonyms for a particular collocation as represented in Figure 2 For the purpose of evaluation, various methods and metrics are applied.In our experiments, we used the method called 10-fold cross-validation and the metrics called F-measure.We will not explain this method and metrics here since it is not our purpose to go into mathematical and computational details, but a more technical-oriented reader may wish to consult (The University of Waikato, 2010a) on these topics.However, we will give a brief interpretation of F-measure later in this section.
Basically, we planned to study the performance of two techniques: one based on word frequency count and the other, on rules.As it was said earlier in this section, machine learning methods take advantage of various mathematical or statistical models.
Basically, they are built on two types of models, in other words, two strategies or approaches.A lot of methods have been elaborated and implemented in WEKA.They vary in details but still remain within the boundaries of either strategy.Now we turn to considering the two approaches.

Two approaches in machine learning: Word frequency count and rules
The first approach to finding patterns in data is to count how many times each feature, also called attribute, occur in examples of each class.This gives us an estimation of how certain one can be in assigning an unseen example to this or that class.Operating on our linguistic data, methods based on frequency counts, or statistical methods, calculate the probability of collocations in the input data to belong to a given lexical function under study using Bayes' theorem.
where LF is any of eight lexical functions in our experiments; H 1 , ... , H n is a set of all hyperonyms for a collocation; P(LF|H 1 , ... , H n ) is the conditional probability of the collocation to belong to a given lexical function if this collocation has the set of hyperonyms H 1 , ... , H n ; P(LF) is the lexical function probability; P(H 1 , ... , H n |LF) is the conditional probability of the set of hyperonyms if the latter belongs to the lexical function, and P(H 1 , ... , H n ) is the probability of the set of hyperonyms.
Many statistical machine learning methods use Bayes' formula together with optimizations and improvements, but all are based on probabilistic knowledge.More details including formulas for calculating probabilities and measures of likelihood can be found in (Witten & Frank, 2005).A limitation of statistical methods is the assumption that all features in data (hyperonyms in our case) are equally important in contributing to the decision of assigning a particular class to an example and also independent of one another.This is a rather simplified view of data, because in many cases data features are not equally important or independent and this is certainly true for linguistic data, especially for such a language phenomenon as hyperonyms.Graphically, hyperonyms form a hierarchic structure called a tree where every hyperonym has its ancestor (except for the hyperonym at the root of the tree) and daughter(s) (except for hyperonyms at the leaves of the tree).Although statistical methods have weak points, in fact, they perform well enough on such linguistic tasks as automatic speech recognition, part-of-speech tagging, word sense disambiguation, machine translation.In this work, we study how statistical methods perform on the task of assigning lexical functions to collocations.
The second approach in machine learning is based on rules.The computer examines the data and looks for rules which can be inferred from it.Rules are conditional statements of the form: If

Experimental results
As it was explained in Section 2.  , 2010a).Values of F-measure lie within the range from 0 to 1. 'Zero' means that the computer failed to complete its task; 'one' means that the task was accomplished with 100% accuracy, so the higher the value of F-measure, the better the computer performance, in our case, the more precise is its recognition of lexical functions.
The training set submitted to WEKA techniques, or classifiers, is built according to the algorithm in Figure 1.In fact, eight training sets, or data sets, one for each lexical function was experimented with.Each data set contains the same 900 verb-noun collocations represented as sets of hyperonyms.The only difference between these data sets is the value of the class variable, which is 'yes' if a collocation belongs to a given lexical function, and 'no' if it does not.5, so ZeroR has no sense as the baseline.
However, the baseline can be a random choice of a positive or a negative answer to the question 'Is this collocation of this particular lexical function?'In such a case we deal with the probability of a positive and negative response.Since we are interested in only assigning the positive answer to a collocation, we calculate the probability of 'yes' class for eight lexical functions in the experiments according to the formula: probability of 'yes' = 1 / (the number of all examples / the number of positive examples of a given lexical function).These probabilities will be results of a classifier that assigns the class 'yes' to collocations at random.Since we will compare the probabilities of the random choice with the results obtained in our experiments, we present the former as numbers within the range from 0 to 1 in Table 5 as well as in Table 6.
We obtained F-measure values for all methods applied in the experiments, and for each lexical function, we chose top-performing techniques among rule-based methods.Table 6 presents the results of these techniques together with the results demonstrated by the statistical method called Naïve Bayes (Witten & Frank, 2005).Naïve Bayes is widely used in natural language processing and has proved itself to be one of the most effective methods for accomplishing linguistic tasks.As an example, see (Provost, 1999).In our experiments, this method showed the average F-measure of only 0.145 and was not able to detect some lexical functions at all (F-measure value of 0.000 for IncepOper 1 , ContOper 1 , Oper 2 and Func 0 ), while rule-based methods significantly outperformed Naïve Bayes and reached the average F-measure of 0.759.As it is clearly seen, the performance of Naïve Bayes for six of eight lexical functions is even less than the baseline which is rather low, though the average result of this method is a little bigger than the average baseline.On the contrary, the results of rule-based methods are significantly higher than the baseline.
'descend in free fall under the influence of gravity', e.g., 'The branch fell from the tree'.Fall reveals its characteristic meaning in free word combinations, and its more abstract sense, in collocations.What do we mean by more abstract sense?An abstract sense is not independent, it is not complete, but rather can be called a 'semantic particle' whose function is not to express the full semantics, but to add semantic features to the base of collocation.
To explain what is meant by 'adding semantic features to the base', let us make an analogy with semantics of grammatical categories which is also very abstract.The verb be in its function as an auxiliary verb does not express any meaning except abstract grammatical categories of time, aspect, and person.In the sentence 'This castle was built in the 15th century', the verb build carries the meaning of an action, and what be does is adding semantic features to the verb, i.e. that this action took place in past, it is passive, not active, and was applied to a single object, because the grammatical number of 'be' is singular.Likewise, fall does not express an event, or a state, but to the word denoting an event or state 'adds' the semantic feature 'begin to occur'.
According to the semantic definition of collocation, the latter differs from free word combinations in the way it constructs its semantics.While the semantics of a free word combination is the sum of the meanings of its elements, collocational meaning is formed by adding more abstract semantic features expressed by the collocate to the full meaning of the base.
Our experiments showed that collocations are recognized better using rules, or conceptual knowledge.It means that the basic criterion for distinguishing collocations from free word combinations is semantic, so there is a good evidence and reason to build definition of collocation on the semantic, not statistical, criterion.

CONCLUSIONS
It has been demonstrated that computer experiments we made on Spanish verb-noun collocations verify the linguistic hypothesis that collocations are not a random stock of word combinations, but they are semantically and syntactically related to one another.We have found than collocations of the same syntactic structure, namely, verb + noun, are organized in groups with similar semantics.Their similarity is represented by the formalism of lexical functions (Mel'čuk, 1996).
We experimented with 20 statistical and 21 rulebased machine learning techniques on the training set of Spanish verb-noun collocations annotated with eight lexical functions.The obtained results have showed that rule-based methods significantly outperform statistical methods.In particular, we compared the results of the best rule-based methods for detecting lexical functions with the results Naïve Bayes, one of the most efficient methods in natural language processing, on the same task.The average F-measure reached by Naïve Bayes is 0.145 while the average F-measure of rule-based methods is 0.759.This proves that rules capture significant semantic features of collocations which are sufficient for discerning collocational meaning represented by lexical functions.
Statistical machine learning methods use models built on probabilistic knowledge while rule-based methods take advantage of conceptual knowledge.Concepts are semantic units and rules are a means of identifying concepts.Therefore, a better performance of rule-based methods over statistical methods demonstrates that the semantic approach to collocation is more helpful in exploring the nature of such a linguistic phenomenon as collocations.

NOTES
1 A hyperonym of a word A is a word B such that B is a kind of A. For example, 'flower' is a name for 'rose', 'daisy', 'tulip', 'orchid', 'so flower' is hyperonym to each of those words.In its turn, hyperonym of 'flower is plant', and the hyperonym of 'plant' is 'living thing', and hyperonym of 'living thing' is 'entity'.Thus hyperonyms of a single word form a chain (rose → flower → plant → living thing → entity), and all words connected by the relation 'kind-of', or hyperonymy, form a tree.
2 The Spanish Web Corpus is accessible only through the Sketch Engine, information on the corpus can be found at http://trac.sketchengine.co.uk/wiki/Corpora/ SpanishWebCorpus/ 3 A notion of token is used mainly in computational linguistics.A token is a string of symbols in text separated by white spaces or punctuation marks.
A token is not a word, but every concrete usage of a word, number, or other symbol in text.For example, in the sentence 'I saw him but he did not see me' there are nine tokens but eight words ('saw' and 'see' is the same word used in different tense forms).Sometimes, speaking of a corpus, the term 'word' is used in the meaning of 'token', for example, 'This corpus contains a million of words'.

Figure 1 .
Figure 1.Algorithm of compiling the data sets.

Table 1 .
Verb-noun collocations and their meaning.

Table 2 .
Verb-noun collocations grouped according to their common semantic pattern.

Table 3 .
Semantic patterns represented as lexical functions.

Table 4 .
Lexical functions chosen for the experiments.

Table 5 .
Probability of selecting 'yes' class at random.

Table 6 .
Performance of statistical and rule-based methods.Usually, in machine learning experiments, and those using WEKA in particular, the classifier ZeroR is chosen as the baseline.ZeroR is a trivial classifier that assigns the majority class to all examples.In our experiments, the majority class is always 'no' since the number of negative examples for each lexical function is much bigger than the number of its positive examples.For example, the number of positive examples for Oper 1 is 280 and the number of its negative examples is 900 -280 = 620.The number of positive examples for the rest seven lexical functions is even less than for Oper 1 as it is seen from Table Table 6 specifies the names of rule-based methods in WEKA implementation; presents their results as well as the number of examples for each lexical function in the training set.To compare our results with the baseline explained above in the previous paragraphs, the probability values of a random selection of the class 'yes' are demonstrated in the same table.