LMF version of the SenSem Catalan Data Base

13 Last view: 2024-07-27

LMF version of the SenSem Catalan Data Base

http://grial.uab.es/

ID:

http://hdl.handle.net/10230/17119

This is the LMF version of the SenSem database created by the Spanish Inter-University Research Group GRIAL.
As part of SenSem project, a corpus of sentences annotated at the semantic and syntactic levels was created.
The source corpus is made up of around 13 million words extracted from the online versions of a Spanish newspaper. From this corpus, 25.000 sentences have been randomly selected, 100 for each of the 250 more frequent verbs in current Spanish. Each sentence has been labeled according to the verb sense it exemplifies, the type of complements it takes (arguments or adjunts), their syntactic category and function, and finally each argument has been labelled with a semantic role. The sentence has also been annotated as to its semantics both in relation with aspectual information and the type of construction being expressed.
From this annotated corpus a lexical data base of verbs was created in which all the previous information will be recollected. The unit of description of the verbs is the sense. In the description of the verbs, argument structure is included, incorporating subcategorization patterns, with the information of frequency of them, semantic roles and information regarding sentence semantics.
The lexicon and the corpus are associated at sense level and together shape up what we call the data bank of the sentential semantic of the Spanish verbs. Both resources are available via web and will form a very important source of linguistic information which we hope will be of utility in different areas of the natural language processing and linguistic research in general.
The LMF conversion has been done by the Universitat Pompeu Fabra.

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Unrestricted Use

Licence

GPL

Download location: hidden

Distribution Access/Medium: Downloadable

Contact Persons

Anna Fernandez Montraveta

Jorge Vivaldi

text

Lexical Conceptual Resource General Information

Lexicon

Encoding

Encoding level: Morphology

Linguistic information: Definition/gloss, Lemma, Semantics - Cross References, Semantics - Semantic Roles, Usage - Examples, Usage - Frequency

Conformance to standards best practices: LMF

Creation

Creation mode: Automatic

Original Sources

SenSem data base located at http://grial.uab.es/...

Monolingual text lexicalConceptualResourceLanguages

Catalan

Linguality

Linguality type: Monolingual

Text Format

text/xml

Size

1,210 Semantic Units

321 Entries

Character encoding

UTF - 8

Domains

general

Resource Creation

Resource Creator

Ana Fernandez Montraveta

Irene Castellón

Grup de Recerca Interuniversitari en Aplicacions Lingüístiques (GRIAL)

Glòria Vázquez

Creation lasted: 03/10/2012 - 04/30/2012

Funding Project

METANET4U (METANET4U)

URL: http://metanet4u.eu/

Funding Types: Eu Funds, Own Funds

Funders: Universitat Pompeu Fabra, European Union

Funding Country: Spain

SenSem (SenSem)

URL: http://grial.uab.es/...

Funding Type: National Funds

Funder: España. Ministerio de Educación y Ciencia

Funding Country: Spain

Metadata

Created: 05/10/2012

Last Updated: 04/10/2012

Source: METASHARE

v2.1

Metadata Creator

Marta Villegas

Version

Version: 0.1

Last Updated: 04/10/2012

ValidationValidated

Type of Validation: Formal

Validation Mode: Automatic

Mode Details: The documents validates against the LMF DTD v 16

People who looked at this resource also viewed the following: