QT21 WMT17 Human Post-Edited data set



Training of Automatic Post-editing and Quality Estimation components / Quality Estimation / Error Analysis.
Set of 10,800 Human Post-Edited (HPE) quadruplets for three language pairs on WMT17 news task data. Each quadruplets consists of (source, reference, target, HPE). For each language pair, the target segments have been produced on the WMT17 news task by the three best WMT17 systems in their respective language pair. Each translation engine has provided 1,200 segments. Translations (targets) have been generated using, “1 62.0 0.308 uedin-nmt”,”3 55.9 0.111 limsi-factored-norm”, “54.1 0.050 CU-Chimera” for En-Cz, “69.8 0.139 uedin-nmt”,”66.7 0.022 KIT”, “66.0 0.003 RWTH-nmt-ensemb” for En-De and “54.4 0.196 tilde-nc-nmt-smt”, “50.8 0.075 limsi-fact-norm”,”50.0 0.058 usfd-cons-qt21” for En-Lv. HPEs for En-De have been collected by professional translators from Text&Form. En-Lv HPEs have been collected by professional translators from Tilde. En-Cz HPEs have been collected by professional translators from Traductera.

IMPORTANT LEGAL NOTICE (This dataset is provided under the following terms of use)
TAUS Terms of Use (https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21).
TAUS grants to QT21 User access to the WMT Data Set with the following rights:
i) the right to use the target side of the translation units into a commercial product, provided that QT21 User may not resell the WMT Data Set as if it is its own new translation;
ii) the right to make Derivative Works; and
iii) the right to use or resell such Derivative Works commercially and for the following goals:
i) research and benchmarking;
ii) piloting new solutions; and
iii) testing of new commercial services.

You don’t have the permission to edit this resource.