Domain adaptation of rule-based annotators for named-entity recognition tasks

  • Authors:
  • Laura Chiticariu;Rajasekar Krishnamurthy;Yunyao Li;Frederick Reiss;Shivakumar Vaithyanathan

  • Affiliations:
  • IBM Research -- Almaden, San Jose, CA;IBM Research -- Almaden, San Jose, CA;IBM Research -- Almaden, San Jose, CA;IBM Research -- Almaden, San Jose, CA;IBM Research -- Almaden, San Jose, CA

  • Venue:
  • EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Named-entity recognition (NER) is an important task required in a wide variety of applications. While rule-based systems are appealing due to their well-known "explainability," most, if not all, state-of-the-art results for NER tasks are based on machine learning techniques. Motivated by these results, we explore the following natural question in this paper: Are rule-based systems still a viable approach to named-entity recognition? Specifically, we have designed and implemented a high-level language NERL on top of SystemT, a general-purpose algebraic information extraction system. NERL is tuned to the needs of NER tasks and simplifies the process of building, understanding, and customizing complex rule-based named-entity annotators. We show that these customized annotators match or outperform the best published results achieved with machine learning techniques. These results confirm that we can reap the benefits of rule-based extractors' explainability without sacrificing accuracy. We conclude by discussing lessons learned while building and customizing complex rule-based annotators and outlining several research directions towards facilitating rule development.