U-Compare E-txt2DB: Giving structure to unstructured data

49 Last view: 2024-07-27

U-Compare E-txt2DB: Giving structure to unstructured data

U-Compare E-txt2DB

http://web.ist.utl.pt/rui.lageira/

Etxt2DB is a framework for specifying and executing Entity Recognition (ER) programs. These programs accept as input a text containing potentially interesting entities to be extracted and produce the input text annotated with the recognized entities.

The Etxt2DB functioning mode involves two distinct phases. First, the training phase consists in creating a model based on a given ER technique and one or more resources that guide the creation of the classification model. Examples of these resources are dictionaries for rule-based ER techniques or training data for statistical learning techniques (e.g., Conditional Random Fields). Second, in the execution phase, a classification model previously created receives as input plain text and produces annotations corresponding to the recognized entities.

The Etxt2DB framework consists of a software layer, built on top of Minorthird and Lingpipe, offering a command-like specification language. Existing Machine Learning Java APIs (such as Minorthird and Lingpipe) provide implementations of Entity Recognition techniques. Some developers of ER applications do not want to get involved in the implementation details of the techniques used. Instead, they are willing to focus on: the choice of the technique to be used; the resources used in the process (e.g., dictionaries); a good set of features that help the ER program to take adequate decisions. The objective of the Etxt2DB specification language is to turn the development and tuning of ER programs easier for developers that are mainly concerned with these topics.

In the context of the METANET project, the goal was to build a component-generator tool that encapsulates Etxt2DB. In the training phase, this tool accepts a training data set as input and produces a classification model and a U-Compare component that is able to interpret that model. In the execution phase, the component produced is loaded into the U-Compare platform and then is ready to be used for recognizing entities from text.

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Unrestricted Use

Licence

GPL

Download location: hidden

Distribution Access/Medium: Downloadable

Contact Persons

Gonçalo Simões

Rui Lageira

Helena Galhardas

toolService

Tool Entity Recognition

Language Independent

Input

Media type: Text

Resource type: Lexical Conceptual Resource

Modality: Written Language

Mime type: txt

Character encoding: UTF - 8

Output

Media type: Text

Resource type: Lexical Conceptual Resource

Language: English

Character encoding: UTF - 8

Annotation type: Semantic Annotation - Named Entities

Annotation format: txt

Tagset: PER, LOC, STIME, ETIME, etc

Operation

Operating system: Os - Independent

Required Software

Java Runtime Environment

Running environment details: Version 1.6 or above recommended

Evaluation

Evaluated: True

Evaluation level: Usage

Evaluator Rui Lageira

Evaluator Gonçalo Simões

Evaluator Helena Galhardas

Creation

Programming language: Java Programming Language

Resource Creation

Resource Creator

Gonçalo Simões

Metadata

Created: 11/23/2012

Last Updated: 11/26/2012

Source: METANET4U

pdf

Metadata Language: English (en)

Revision: v1

Metadata Creator

Rui Lageira

Gonçalo Simões

Documentation

Document Type: Manual

Gonçalo Simões, Manual of E-txt2DB, https://www.l2f.ines...

People who looked at this resource also viewed the following: