Polish Named Entity Gazetteer
The Polish Named Entity Gazetteer is an electronic lexicon containing partly inflected
entries of Polish (and some foreign) proper names and named entity components (forenames and surnames, geographical names, organizational names, relational adjectives and inhabitant names stemming from country names as well as named entity triggers – months, days,
positions, etc.). The resource was used for the automatic pre-annotation of the National Corpus of Polish (NKJP) on the level of named entities. The resource is available in: (i) a textual format compliant with the Sprout text processing platform, (ii) an LMF-compliant format. It contains about 45,000 lemmas and 135,000 word forms.
- Publications of the Commission for Standardization of Geographic Names outside Polish Frontiers (Komisja Standaryzacji Nazw Geograficznych poza Granicami Rzeczypospolitej Polskiej), freely available at http://www.gugik.gov.... The Polish Wikipedia (http://pl.wikipedia.org) – source of capitals of administrative units of different countries (http://pl.wikipedia....), rivers (http://pl.wikipedia...., http://pl.wikipedia...., etc.), historical regions of Europe (http://pl.wikipedia....), mountain chains (selected from several Wikipedia categories), adjectives and citizen names stemming from country names (http://pl.wiktionary...). The World Gazetteer (http://www.world-gaz...) – source of the list of 200 biggest Polish cities.
- SProUT Extraction Platform adapted to Polish Named Entity Annotation with a fully automated rule-based NER system for Polish
People who looked at this resource also viewed the following: