Corpus of Contemporary Serbian




The Corpus of contemporary Serbian, SrpKor, consists of 4,925 texts. Total size of SrpKor is 118,767,279 words. It is lemmatized and PoS tagged using TreeTagger. SrpKor texts consist of: fiction written by Serbian authors in 20th and 21th century (10,191,092 words), various scientific texts from various domains (both humanities and sciences) (3,542,169 words), legislative texts (6,874,318 words) and general texts (98,159,700 words). General texts represent daily news published in newspaper "Politika" 2000-2002 and 2005-2010, texts in journals and magazines 1991-2002 ("Danica", "Ebit", "Ekonomist", "Glasnik", "NIN", "Ilustrovana politika", "Kalibar", "Moje srce", "Mostovi", "Pravoslavlje", "Svet", "Teološki pogledi", "Trn", "Viva", "Republika"), internet portal texts 2011-2012 (Peščanik), TANJUG agency news 1995-96, newspaper feuilletons published in newspapers "Politika" (2001-2003), "Večernje novosti" (2008-2011) and "Danas" (2002-2006).

  • Tree Tagger
  • downloading from Web; retyping; scanning and OCR; obtaining from authors and translators
    • Corpus query processor (CQP)