Bilingual term pairs extracted from comparable news feeds resources using the TaaS Bilingual Term Extraction System.



The resource contains bilingual term pairs automatically extracted from comparable resources found in news feeds using the TaaS Bilingual Term Extraction System. The workflow for bilingual term extraction consisted of:
1) RSS News Collector for comparable corpora collection from RSS news feeds and for plaintext extraction.
2) DictMetric for cross-lingual document level alignment of the collected comparable corpora.
3) Tilde's Wrapper System for CollTerm for identification of terms in plaintext documents.
4) Term normalisation tools developed by Tilde and University of Sheffield for acquisition of term normalised (canonical) forms from terms in different surface forms.
5) MPAligner in order to extract bilingual term pairs (align terms) from term tagged Wikipedia document pairs.

