QT21 WMT17 Human Error-Annotated data set



Training of Automatic Post-editing and Quality Estimation components / Quality Estimation / Error Analysis.
Set of 3,600 WMT17 Human Error Annotated (HEA) quadruplets for 3 language pairs (English to German, English to Czech, English to Latvian) and 9 translation engines. Each quadruplet consists in (source, reference, HPE, HEA). The source data comes from the WMT17 news task. A total of 9 translation engines have been used to produce the targets that have been post edited: Translations (targets) have been generated using, “1 62.0 0.308 uedin-nmt”,”3 55.9 0.111 limsi-factored-norm”, “54.1 0.050 CU-Chimera” for En-Cz, “69.8 0.139 uedin-nmt”,”66.7 0.022 KIT”, “66.0 0.003 RWTH-nmt-ensemb” for En-De and “54.4 0.196 tilde-nc-nmt-smt”, “50.8 0.075 limsi-fact-norm”,”50.0 0.058 usfd-cons-qt21” for En-Lv. From each translation engine, 200 target segments have been post edited which further have been error annotated by 2 different professional translator. En-De HEAs have been collected by professional translators from Text&Form. En-Lv HEAs have been collected by professional translators from Tilde. En-Cz HEAs have been collected by professional translators from Aspena. All data is provided by the EU project QT21 (http://www.qt21.eu/).
IMPORTANT LEGAL NOTICE (This dataset is provided under the following terms of use)
TAUS Terms of Use (https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21).
TAUS grants to QT21 User access to the WMT Data Set with the following rights:
i) the right to use the target side of the translation units into a commercial product, provided that QT21 User may not resell the WMT Data Set as if it is its own new translation;
ii) the right to make Derivative Works; and
iii) the right to use or resell such Derivative Works commercially and for the following goals:
i) research and benchmarking;
ii) piloting new solutions; and
iii) testing of new commercial services.

You don’t have the permission to edit this resource.