Named Entities evaluation corpus for Serbian




Named Entities evaluation corpus for Serbian (SrpNEval) consists of 2000 short news published by Serbian news agencies and daily newspapers in 2005 and 2006. They cover mostly Serbian internal and foreign politics. The size of corpus is: 2,000 short news, 3,343 sentences, 89,425 words, 7,122 named entity tags. Named entities were automatically recognized and manually corrected. Recognized named entitites are: persons (full names - 1,648, last names - 152, distinguished - 24), time expressions (dates - 366 and time of day - 33), measurement expressions - 17, money expressions - 146, amount expressions - 322, percent expressions - 53, geopolitical names (countries - 1,390, cities - 1,440, hydronyms - 11, oronyms - 21), organizations - 1,499. The corpus comes in two variants: Latin alphabet and Cyrillic alphabet.

You don’t have the permission to edit this resource.