A rapid application development framework for rule-based named-entity extraction

  • Authors:
  • Ashish Sureka;Pranav Prabhakar Mirajkar;Kishore Indukuri Varma

  • Affiliations:
  • Infosys Technologies Limited, Bangalore, India;Infosys Technologies Limited, Bangalore, India;Infosys Technologies Limited, Bangalore, India

  • Venue:
  • Proceedings of the 2nd Bangalore Annual Compute Conference
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Named Entity Recognition and Classification (NERC) consist of identifying and labeling specific pieces of information like proper names from free-form textual data. There are primarily three approaches to named-entity extraction: hand-crafted rule based, machine-learning based and hybrid. Rule-based approaches consist of defining heuristics in the form of regular expressions or linguistic pattern and making use of dictionaries and lexicons for extracting named-entities. Rule-based approaches have proven to be quite successful but one of their limitations is that it requires a domain expert to manually define and encode the rules. The process of hand-engineering rules is a time consuming and tedious process. It also requires a domain expert, cannot be easily ported to other domains and languages and becomes hard to maintain. Machine learning based approaches tries to overcome these limitations by automatically learning rules or inducing a model rather than defining the rules by a human expert. In this work, we present our research on overcoming the limitations of rule-based approach by building a rapid application development framework that can expedite the process of rule-building and making it easy to maintain and apply it to other domains. We describe a framework that can enable a business user to easily define and maintain rules and lexicons.