RLA. Revista de lingüística teórica y aplicada
versión On-line ISSN 0718-4883
CASTELLON, IRENE et al. Building a verbal semantics corpus of Spanish: Methodology for labelling phrase heads. RLA [online]. 2012, vol.50, n.1, pp.13-38. ISSN 0718-4883. http://dx.doi.org/10.4067/S0718-48832012000100002.
The SenSem Corpus and Database (Alonso, Capilla, Castellón, Fernández y Vázquez, 2007) consists of a verb-oriented balanced corpus of Spanish linked to syntactic and semantic database of predicates and sentences. The corpus consists of 100 sentences for each of the 250 more frequent verbs of Spanish. It is labelled with rich semantic and syntactic information which is structured in the database according to verb senses, thus providing an invaluable resource for verb-focused linguistic empirical research. In this paper we present the process and methodology adopted for labelling nominal argument-structure heads with WordNet sense-id's. As a by-product, both a critical assessment of Spanish WordNet 1.6 as a resource for semantic labelling and a labelling criteria guide are discussed and provided so that they might be useful in future similar research.
Palabras clave : Corpus linguistics; semantic annotation; WordNet; SenSem corpus.