eSCAPE: a Large-scale Synthetic Corpus for Automatic Post-Editing


Training of Automatic Post-editing models based on (source, reference and target)
Set of 7,258,533 English-German and 3,357,371 English-Italian triplets (source, target and reference). For each language pair, the data set is available in two versions: one, where the target segments are produced by a neural MT system and another, where the targets are obtained by a phrase-based MT system. The MT systems used to generate the targets are instances of the open-source Modern MT tool developed by the European project MMT. The original (source, reference) pairs are derived from a collection of corpora from different domains that is available in the OPUS repository.

You don’t have the permission to edit this resource.
People who looked at this resource also viewed the following: