bgMWE – tool for MWE recognition




bgMWE is a tool for corpus processing and MWE recognition and tagging. It is developed in Java and is thus platform independent. bgMWE comprises a set of modules which can be applied for particular NLP tasks. It is largely language independent and can work either in resource-light mode, or its performance can be boosted by employing lexical resources. The system includes the following modules:
Web crawler for Wikipedia;
Extraction of lexical data – lists of words and MWEs;
Converter between formats – vertical format, XML, etc.;
Preprocessing module – applying a chunker, a tagger, etc.;
Collection of frequency data;
MWE recognition and tagging;

Further improvement of bgMWE is planned in the following directions: improving efficiency; implementing various methods for MWE recognition; developing a visualisation module or integrating existing open source visualisation methods; module for extensive evaluation.

