Croatian-English Parallel Web Corpus

81 Last view: 2024-05-04

Croatian-English Parallel Web Corpus

hrenWaC

http://www.nljubesic.net/resources/corpora/hrenwac/

ID:

308 Croatian-English Parallel Web Corpus is a collection of paraellel Croatian-English texts crawled from .hr domain. This corpus was automatically collected by finding on-line documents in English that parallel to the documents already crawled in hrWaC. The parallelity of texts was calculated and selection treshold empirically set to 0.52 on a scale between 0 and 1. After that, the collection of parallel-text candidates has been manually inspected for real parallel texts. The initial crawled corpus had ca 253,000 sentence/translation units pairs (ca 8 Mw per language), while the manual checking resulted in 99,001 sentence/translation units pairs. The corpus is distributed under the CC-BY-SA licence.

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Restricted Use

Licence

CC - BY - SA

Restrictions: Attribution, Share Alike

Execution location: hidden

Distribution Access/Medium: Downloadable

Distribution rights holders:

University of Zagreb, Faculty of Humanities and Social Sciences

IPR Holder

University of Zagreb, Faculty of Humanities and Social Sciences

Contact Person

Nikola Ljubešić

text

Bilingual text corpusLanguages

English

Language Script: Latn

Croatian

Language Script: Latn

Linguality

Linguality type: Bilingual

Size

99 001 Units

Character encoding

UTF - 8

AnnotationSegmentation

Segmentation level: Sentence

Alignment

Segmentation level: Sentence

Resource Creation

Resource Creator

Univ. of Zagreb, Faculty of Humanities and Social Sciences, Depts. of Linguistics & Information Sci.

Creation started: 04/01/2012

Funding Project

Central and South-East European Resources (CESAR)

URL: http://www.cesar-pro...

Funding Types: Eu Funds, National Funds

Funders: European Commission (50%), University of Zagreb, Faculty of Humanities and Social Sciences (50%)

Project duration: 02/01/2011 - 01/31/2013

Metadata

Created: 07/30/2012

Last Updated: 02/04/2013

Metadata Creator

Marko Tadić

Version

Version: 1.0

Last Updated: 07/30/2012

People who looked at this resource also viewed the following: