Corpus of Spoken Bulgarian




The Corpus of Spoken Bulgarian (SpokenBg) is a selection of data of spoken Bulgarian language incl. data from interviews, media and formal speech, student speech, academic speech, colloquial speech. The total size of the corpus is 523,128 signs as of the end of 2012.
Part of it contains edited versions of transcripts of conversational speech being converted from a semi-phonetic transcription to standard orthography with original semi-phonetic transcripts presented together with the edited versions in a paragraph-aligned display.

