Induction of ripple-down rules applied to modeling large databases
Journal of Intelligent Information Systems
Separate-and-Conquer Rule Learning
Artificial Intelligence Review
Learning Information Extraction Rules for Semi-Structured and Free Text
Machine Learning - Special issue on natural language learning
Foundations of Inductive Logic Programming
Foundations of Inductive Logic Programming
Foundations of Databases: The Logical Level
Foundations of Databases: The Logical Level
The common pattern specification language
TIPSTER '98 Proceedings of a workshop on held at Baltimore, Maryland: October 13-15, 1998
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
An Algebraic Approach to Rule-Based Information Extraction
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Regular expression learning for information extraction
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
SystemT: an algebraic approach to declarative information extraction
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Automatically constructing a dictionary for information extraction tasks
AAAI'93 Proceedings of the eleventh national conference on Artificial intelligence
Domain adaptation of rule-based annotators for named-entity recognition tasks
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
ILP'09 Proceedings of the 19th international conference on Inductive logic programming
Automatic rule refinement for information extraction
Proceedings of the VLDB Endowment
Data Mining: Practical Machine Learning Tools and Techniques
Data Mining: Practical Machine Learning Tools and Techniques
Text Processing with GATE
Data-based research at IIT Bombay
ACM SIGMOD Record
Hi-index | 0.00 |
Generic rule-based systems for Information Extraction (IE) have been shown to work reasonably well out-of-the-box, and achieve state-of-the-art accuracy with further domain customization. However, it is generally recognized that manually building and customizing rules is a complex and labor intensive process. In this paper, we discuss an approach that facilitates the process of building customizable rules for Named-Entity Recognition (NER) tasks via rule induction, in the Annotation Query Language (AQL). Given a set of basic features and an annotated document collection, our goal is to generate an initial set of rules with reasonable accuracy, that are interpretable and thus can be easily refined by a human developer. We present an efficient rule induction process, modeled on a four-stage manual rule development process and present initial promising results with our system. We also propose a simple notion of extractor complexity as a first step to quantify the interpretability of an extractor, and study the effect of induction bias and customization of basic features on the accuracy and complexity of induced rules. We demonstrate through experiments that the induced rules have good accuracy and low complexity according to our complexity measure.