GENIA POS & Term Corpus



A corpus of 2,000 MEDLINE abstracts, collected using the three MeSH terms human, blood cells and transcription factors. The corpus is available in three formats: 1) A text file containing part-of-speech (POS) annotation, based on the Penn Treebank format, 2) An XML file containing inline POS annotation, 3) A “merged” XML format, containing inline annotations, corresponding to both POS and term annotations

You don’t have the permission to edit this resource.

People who looked at this resource also viewed the following: