Bilingual Greek-English Comparable Corpus of News Texts
This is a bilingual (Greek - English) comparable corpus of News texts that pertain to the following domains: Technology, Politics, Entertainment (Culture, Sports), News (Terrorism, Economy), and Science (Physics, Health). The corpus amounts to approximately 3,5M words collected over various web sites. The texts have been classified according to selected elements of the IPTC subject reference system and, consequently, vertical clustering and horizontal mapping have been performed.
- web news sites