Polish Named Entity Gazetteer




The Polish Named Entity Gazetteer is an electronic lexicon containing partly inflected
entries of Polish (and some foreign) proper names and named entity components (forenames and surnames, geographical names, organizational names, relational adjectives and inhabitant names stemming from country names as well as named entity triggers – months, days,
positions, etc.). The resource was used for the automatic pre-annotation of the National Corpus of Polish (NKJP) on the level of named entities. The resource is available in: (i) a textual format compliant with the Sprout text processing platform, (ii) an LMF-compliant format. It contains about 45,000 lemmas and 135,000 word forms.

  • SProUT Extraction Platform adapted to Polish Named Entity Annotation with a fully automated rule-based NER system for Polish