CINTIL-LogicalFormBank

89 Last view: 2024-07-09

CINTIL-LogicalFormBank

The CINTIL-LogicalFormBank (Branco, 2009, and Branco et al., 2011) is a corpus of semantic dependencies of sentences from Portuguese texts composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), novels (399 sentences; 3,082 tokens) (see 3.2.). In addition, there are 779 sentences (5,654 tokens) that are used for regression testing of the computational grammar that supported the annotation of the corpus (cf. section 4.6.).

The CINTIL-LogicalFormBank is composed of MRS representations of each sentence’s semantic relations resulting from a previous semi-automatic analysis with a double-blind annotation followed by adjudication (see Branco and Costa, 2008, with a full description of the process). The resulting dataset contains one information level: semantic relations.

The main motivation behind the creation of this resource was to build a high quality data set with syntactic information that could support the development of a large set of automatic resources and tools for Portuguese for NLP studies.

The development of this resource started under the project SemanticShare – Resources and Tools for Semantic Processing (at: http://nlx.di.fc.ul.pt/projects.html) whose main goal was to generate a deep linguistic annotated corpus of Portuguese, with manually verified grammatical representations.

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Restricted Use

Licence

Proprietary

Restrictions: Academic - Non Commercial Use

User Nature: Academic

Distribution Access/Medium: Downloadable

Licensors:

António Branco

Distribution rights holders:

António Branco

IPR Holder

António Branco

Contact Person

António Branco

text

Monolingual text corpusLanguages

Portuguese

Linguality

Linguality type: Monolingual

Text Format

text/xml

Size

10,039 Sentences

Character encoding

UTF - 8

Domains

Novels (399 Sentences)

Test (779 Sentences)

News (8,861 Sentences)

Modalities

Written Language

Geographic coverage

Portugal (10,039 Sentences)

Resource Creation

Resource Creator

António Branco

Metadata

Created: 11/23/2012

Last Updated: 11/23/2012

Source: METANET4U

META-SHARE

Metadata Language: English (en)

Metadata Creator

Catarina Carvalheiro

Version

Version: 1.0

Last Updated: 11/23/2012

Usage

Foreseen UseNlp Applications

Use NLP Specific: Parsing

Actual Use - Nlp Applications

Use NLP Specific: Parsing

Documentation

Tool Documentation: Online

Samples Location: http://194.117.45.19...

Document Type: Other

Catarina Carvalheiro, CINTIL_LogicalFormBank Narrative Description, http://194.117.45.19...

Document Type: In Proceedings

António Branco, LogicalFormBanks, the Next Generation of Semantically Annotated Corpora: key issues in construction methodology, http://www.di.fc.ul.... , 2009

Book Title: Proceedings of International Joint Conference on Intelligent Information Systems

Document Language: English

People who looked at this resource also viewed the following: