ECI-ELSNET Italian & German tagged sub-corpus
View resource name in all available languages
ECI-ELSNET Sous-corpus balisé en italien et allemand
ID:
ELRA-W0005
The objective is to provide a small but fine grained morphosyntactically tagged corpus, 50.000 running words for each of the two languages (Italian and German) to be used in research work on tagging methods and models. The text for German comes from the Frankfurter Rundschau extracted from the ECI corpus, the Italian material comes from the Italian corpus of ILC - CNR. For German the data concerns several domains including Economy (17,000 word forms), Politics (14,000 word forms), Culture (18,000 word forms), Sports (9,000 word forms), and Local Events (8500 word forms). The situation for Italian is comparable to that. Word occurrences are tagged with very fine grained tagsets which are based on the EAGLES morphosyntactic guidelines.
The tagging, done automatically, has been manually checked. The CD-ROM contains: the text in SGML format; the DBT software which allows different browsing and operations on the annotated text and the EAGLES guidelines for morphosyntactic.
View resource description in all available languages
Sous-corpus balisé en italien et en allemand - Les données allemandes sont extraites du quotidien allemand Frankfurter Rundschau du corpus ECI et les données italiennes de ILC/CNR, et traitent toutes de domaines tels que l'économie, la politique, la culture, le sport et les événements locaux.
Le corpus comprend les domaines suivants: Economie (17,000 mots), Politique (14,000 mots), Culture (18,000 mots), Sports (9,000 mots), Evénements locaux (8,500 mots).
Le balisage des mots, fait automatiquement suivant les recommandations d'EAGLES pour le codage du niveau morphosyntaxique, a été vérifié manuellement.
People who looked at this resource also viewed the following:
- C-ORAL-ROM - Integrated reference corpora for spoken romance languages. Multi-media edition; tools of analysis; standard linguistic measurements for validation in HLT
- EASTIN-CL Multilingual Ontology of Assistive Technology
- AURORA Project database - Subset of SpeechDat-Car - Italian database - Evaluation Package
- CRIPCO