CST Tokeniser


Sentence segmenter. Optional tokenisation, MWU-recognition and recognition of abbreviations. Input from RTF (rich text) or flat text. In the case of RTF, layout and style info is used to recognise and properly treat e.g. head lines and bulleted lists.

