Towards efficient named-entity rule induction for customizability

Authors:
Ajay Nagesh;Ganesh Ramakrishnan;Laura Chiticariu;Rajasekar Krishnamurthy;Ankush Dharkar;Pushpak Bhattacharyya
Affiliations:
IITB-Monash Research Academy and IIT Bombay;IIT Bombay;IBM Research - Almaden;IBM Research - Almaden;SASTRA University;IIT Bombay
Venue:
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Year:
2012

Citing 16
Cited 1

Induction of ripple-down rules applied to modeling large databases

Journal of Intelligent Information Systems
Separate-and-Conquer Rule Learning

Artificial Intelligence Review
Learning Information Extraction Rules for Semi-Structured and Free Text

Machine Learning - Special issue on natural language learning
Foundations of Inductive Logic Programming

Foundations of Inductive Logic Programming
Foundations of Databases: The Logical Level

Foundations of Databases: The Logical Level
The common pattern specification language

TIPSTER '98 Proceedings of a workshop on held at Baltimore, Maryland: October 13-15, 1998
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
An Algebraic Approach to Rule-Based Information Extraction

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Regular expression learning for information extraction

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
SystemT: an algebraic approach to declarative information extraction

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Automatically constructing a dictionary for information extraction tasks

AAAI'93 Proceedings of the eleventh national conference on Artificial intelligence
Domain adaptation of rule-based annotators for named-entity recognition tasks

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Incorporating linguistic expertise using ILP for named entity recognition in data hungry Indian languages

ILP'09 Proceedings of the 19th international conference on Inductive logic programming
Automatic rule refinement for information extraction

Proceedings of the VLDB Endowment
Data Mining: Practical Machine Learning Tools and Techniques

Data Mining: Practical Machine Learning Tools and Techniques
Text Processing with GATE

Text Processing with GATE

Data-based research at IIT Bombay

ACM SIGMOD Record

Quantified Score

Hi-index	0.00

Visualization

Abstract

Generic rule-based systems for Information Extraction (IE) have been shown to work reasonably well out-of-the-box, and achieve state-of-the-art accuracy with further domain customization. However, it is generally recognized that manually building and customizing rules is a complex and labor intensive process. In this paper, we discuss an approach that facilitates the process of building customizable rules for Named-Entity Recognition (NER) tasks via rule induction, in the Annotation Query Language (AQL). Given a set of basic features and an annotated document collection, our goal is to generate an initial set of rules with reasonable accuracy, that are interpretable and thus can be easily refined by a human developer. We present an efficient rule induction process, modeled on a four-stage manual rule development process and present initial promising results with our system. We also propose a simple notion of extractor complexity as a first step to quantify the interpretability of an extractor, and study the effect of induction bias and customization of basic features on the accuracy and complexity of induced rules. We demonstrate through experiments that the induced rules have good accuracy and low complexity according to our complexity measure.