PELCRA Polish-English parallel corpora (CC-BY-NC)

86 Last view: 2024-12-08

PELCRA Polish-English parallel corpora (CC-BY-NC)

PELCRA-PAR-2

http://pelcra.pl/res/parallel/pelcra-par-2

ID:

502 A subset of the PELCRA Polish parallel corpora licensed under the CC-BY-NC license. This resource contains 257 texts from the PAS Academia journal. Individual headers may override the licensing information.

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Restricted Use

Start date: 11/30/2011

Licence

CC - BY - NC

Restrictions: Other

Download location: hidden

Distribution Access/Medium: Downloadable

Licensors:

Piotr Pęzik

Distribution rights holders:

University of Łódź

IPR Holder

Polish Academy of Sciences

Contact Persons

Piotr Pęzik

Łukasz Dróżdż

text

Bilingual text corpusLanguages

English (387,000 Tokens) Polish (323,000 Tokens)

Linguality

Linguality type: Bilingual

Multi-linguality type: Parallel (The texts were manually aligned on a sentence level using the MemoQ semgent alignment tool (http://kilgray.com/products/memoq).)

Text Format

text/xml (710,000 Tokens)

Size

710,000 Tokens

Character encoding

UTF - 8 (710,000 Tokens)

Domains

science (710,000 Tokens)

Modalities

Other

AnnotationSegmentation

Segmentation level: Sentence

Format: text/xml

Standard practices conformance: TEI

Annotation Mode: Mixed

Annotation Tools:

MemoQ

Start date: 08/01/2011

End date: 09/30/2011

Size: 710,000 Tokens

Annotators:

Dróżdż Łukasz

Piotr Pęzik

Alignment

Segmentation level: Sentence

Format: text/xml

Standard practices conformance: TEI

Annotation Mode: Manual

Annotation Tools:

MemoQ

Start date: 08/01/2011

End date: 09/30/2011

Size: 710,000 Tokens

Annotators:

Dróżdż Łukasz

Piotr Pęzik

Time Coverage

2005-2010 (710,000 Tokens)

Creation

Creation mode details: Semi-automatic acquisition and processing.

Creation mode: Mixed

Original Sources

http://www.academia....

Creation Tools

in-house software

Resource Creation

Resource Creator

Piotr Pęzik

University of Łódź

Łukasz Dróżdż

Creation lasted: 08/01/2011 - 09/30/2011

Funding Project

Central and South-East European Resources (CESAR - 271022)

URL: http://www.meta-net....

Funding Type: Eu Funds

Funder: DG INFSO of the European Commission

Funding Country: European Union

Project duration: 02/01/2011 - 01/31/2013

Metadata

Created: 10/24/2011

Last Updated: 01/22/2013

Source: CESAR

Metadata Language: English (en)

Metadata Creator

Dróżdż Łukasz

Piotr Pęzik

Version

Version: 1.0

Revision: compilation of the corpus

Last Updated: 09/30/2011

ValidationValidated (710,000 Tokens)

Type of Validation: Formal

Validation Mode: Automatic

Mode Details: Validation of the XML structure in accordance with the TEI P5 and XLIFF schemas with custom PELCRA extensions.

Extent: Full

Validation Tools:

xmllint

Validator

Łukasz Dróżdż

Piotr Pęzik

Documentation

http://pelcra.pl/pro...

People who looked at this resource also viewed the following: