bgMWE is a tool for corpus processing and MWE recognition and tagging. It is developed in Java and is thus platform independent. bgMWE comprises a set of modules which can be applied for particular NLP tasks. It is largely language independent and can work either in resource-light mode, or its performance can be boosted by employing lexical resources. The system includes the following modules: Web crawler for Wikipedia; Extraction of lexical data – lists of words and MWEs; Converter between formats – vertical format, XML, etc.; Preprocessing module – applying a chunker, a tagger, etc.; Collection of frequency data; MWE recognition and tagging;
Further improvement of bgMWE is planned in the following directions: improving efficiency; implementing various methods for MWE recognition; developing a visualisation module or integrating existing open source visualisation methods; module for extensive evaluation.