A rapid application development framework for rule-based named-entity extraction

Authors:
Ashish Sureka;Pranav Prabhakar Mirajkar;Kishore Indukuri Varma
Affiliations:
Infosys Technologies Limited, Bangalore, India;Infosys Technologies Limited, Bangalore, India;Infosys Technologies Limited, Bangalore, India
Venue:
Proceedings of the 2nd Bangalore Annual Compute Conference
Year:
2009

Citing 3
Cited 0

Information Extraction: Techniques and Challenges

SCIE '97 International Summer School on Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology
The NYU system for MUC-6 or where's the syntax?

MUC6 '95 Proceedings of the 6th conference on Message understanding
Infrastructure for open-domain information extraction

HLT '02 Proceedings of the second international conference on Human Language Technology Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Named Entity Recognition and Classification (NERC) consist of identifying and labeling specific pieces of information like proper names from free-form textual data. There are primarily three approaches to named-entity extraction: hand-crafted rule based, machine-learning based and hybrid. Rule-based approaches consist of defining heuristics in the form of regular expressions or linguistic pattern and making use of dictionaries and lexicons for extracting named-entities. Rule-based approaches have proven to be quite successful but one of their limitations is that it requires a domain expert to manually define and encode the rules. The process of hand-engineering rules is a time consuming and tedious process. It also requires a domain expert, cannot be easily ported to other domains and languages and becomes hard to maintain. Machine learning based approaches tries to overcome these limitations by automatically learning rules or inducing a model rather than defining the rules by a human expert. In this work, we present our research on overcoming the limitations of rule-based approach by building a rapid application development framework that can expedite the process of rule-building and making it easy to maintain and apply it to other domains. We describe a framework that can enable a business user to easily define and maintain rules and lexicons.