eSCAPE: a Large-scale Synthetic Corpus for Automatic Post-Editing


Training of Automatic Post-editing models based on (source, reference and target)
Set of 7,258,533 English-German and 3,357,371 English-Italian triplets (source, target and reference). For each language pair, the data set is available in two versions: one, where the target segments are produced by a neural MT system and another, where the targets are obtained by a phrase-based MT system. The MT systems used to generate the targets are instances of the open-source Modern MT tool developed by the European project MMT. The original (source, reference) pairs are derived from a collection of corpora from different domains that is available in the OPUS repository.

