MULTEXT JOC Corpus

15 Last view: 2024-07-09

View resource name in all available languages

Corpus JOC MULTEXT

http://catalog.elra.info/product_info.php?products_id=534

ID:

ELRA-W0017

This CD-ROM contains a part of the corpus developed in the MULTEXT project financed by the European Commission (LRE 62-050). This part contains raw, tagged and aligned data from the Written Questions and Answers of the Official Journal of the European Community. The corpus contains approx. 5 million words in English, French, German, Italian and Spanish (approx. 1 million words per language). About 800,000 words were grammatically tagged and manually checked for English, French, Italian and Spanish, i.e. roughly 200,000 words per language. The same subset for French, German, Italian and Spanish was aligned to English at the sentence level.

The JOC corpus is delivered in Corpus Encoding Standard conformant format at each level of treatment :

paragraph annotation level, conformant to the CESDOC specifications (1 M words * 5 languages);
morpho-syntactic annotation level (PoS Tagging), conformant to CESANA specifications (200,000 words * 4 languages);
parallel text alignment at sentence level, conformant to CESALIGN specifications (200,000 words * 4 languages).
Additional information: http://www.lpl.univ-aix.fr/projects/multext

View resource description in all available languages

Ce corpus contient une partie du corpus développé dans le cadre du projet MULTEXT financé par la Commission européenne (LRE 62-050). Cette partie comprend des données brutes, étiquetées et alignées des questions écrites et des réponses du Journal Officiel de la Communauté Européenne. Ce corpus contient environ 5 millions de mots en allemand, anglais, espagnol, français et italien (env. 1 million de mots par langue). Près de 800 000 mots ont été étiquetés grammaticalement et vérifiés manuellement pour l'anglais, le français, l'italien et l'espagnol (env. 200 000 mots par langue). Le même sous-ensemble pour le français, l'allemand, l'italien et l'espagnol a été aligné à l'anglais au niveau de la phrase.

Le corpus JOC est fourni sous un format conforme au standard de codage CES (Corpus Encoding Standard) pour chaque niveau de traitement :

niveau d'annotation du paragraphe, conforme aux spécifications CESDOC (1 M mots * 5 langues),
niveau d'annotation morpho-syntaxique (étiquetage des parties du discours), conforme aux spécifications CESANA (200 000 mots * 4 langues),
alignement de texte parallèle au niveau de la phrase, conforme aux spécifications CESALIGN (200 000 mots * 4 langues).
Pour plus d'informations, visiter le site Web suivant : http://www.lpl.univ-aix.fr/projects/multext

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Restricted Use

Start date: 11/23/1998

Licence

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Non Members of ELRA

User Nature: Commercial

ELRA VAR

Restrictions: Commercial Use

For Members of ELRA

User Nature: Commercial

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Members of ELRA

User Nature: Commercial

ELRA VAR

Restrictions: Commercial Use

For Members of ELRA

User Nature: Academic

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Members of ELRA

User Nature: Academic

ELRA VAR

Restrictions: Commercial Use

For Non Members of ELRA

User Nature: Commercial

ELRA VAR

Restrictions: Commercial Use

For Non Members of ELRA

User Nature: Academic

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Non Members of ELRA

User Nature: Academic

Contact Person

Mapelli Valérie

text

Multilingual text corpusLanguages

Spanish

Variety: Castilian (Type: Dialect) (2 Gb)

Italian French French English Italian German French Spanish

Variety: Castilian (Type: Dialect) (2 Gb)

Italian German Spanish

Variety: Castilian (Type: Dialect) (2 Gb)

English

Linguality

Linguality type: Multilingual

Size

5,000,000 Words

Resource Creation

Funding Project

MULTEXT (LRE 62-050)

Funding Type: Eu Funds

Metadata

Created: 05/12/2005

Version

Version: 1.0

Last Updated: 01/24/2013

People who looked at this resource also viewed the following: