SystemT: an algebraic approach to declarative information extraction

  • Authors:
  • Laura Chiticariu;Rajasekar Krishnamurthy;Yunyao Li;Sriram Raghavan;Frederick R. Reiss;Shivakumar Vaithyanathan

  • Affiliations:
  • IBM Research -- Almaden, San Jose, CA;IBM Research -- Almaden, San Jose, CA;IBM Research -- Almaden, San Jose, CA;IBM Research -- Almaden, San Jose, CA;IBM Research -- Almaden, San Jose, CA;IBM Research -- Almaden, San Jose, CA

  • Venue:
  • ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

As information extraction (IE) becomes more central to enterprise applications, rule-based IE engines have become increasingly important. In this paper, we describe SystemT, a rule-based IE system whose basic design removes the expressivity and performance limitations of current systems based on cascading grammars. SystemT uses a declarative rule language, AQL, and an optimizer that generates high-performance algebraic execution plans for AQL rules. We compare SystemT's approach against cascading grammars, both theoretically and with a thorough experimental evaluation. Our results show that SystemT can deliver result quality comparable to the state-of-the-art and an order of magnitude higher annotation throughput.