The SystemT IDE: an integrated development environment for information extraction rules

Authors:
Laura Chiticariu;Vivian Chu;Sajib Dasgupta;Thilo W. Goetz;Howard Ho;Rajasekar Krishnamurthy;Alexander Lang;Yunyao Li;Bin Liu;Sriram Raghavan;Frederick R. Reiss;Shivakumar Vaithyanathan;Huaiyu Zhu
Affiliations:
IBM Research - Almaden, San Jose, USA;IBM research - Almaden, San Jose, USA;IBM Research - Almaden, San Jose, USA;IBM Software - Germany, Boeblingen, Germany;IBM Research - Almaden, San Jose, USA;IBM Research - Almaden, San Jose, USA;IBM Software - Germany, Boeblingen, Germany;IBM Research - Almaden, San Jose, USA;University of Michigan, Ann Arbor, MI, USA;IBM Research - India, Bangalore, India;IBM Research - Almaden, San Jose, USA;IBM Research - Almaden, San Jose, USA;IBM Research - Almaden, San Jose, USA
Venue:
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Year:
2011

Citing 8
Cited 3

SystemT: a system for declarative information extraction

ACM SIGMOD Record
An Algebraic Approach to Rule-Based Information Extraction

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Provenance in Databases: Why, How, and Where

Foundations and Trends in Databases
Regular expression learning for information extraction

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Enterprise information extraction: recent developments and open challenges

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
SystemT: an algebraic approach to declarative information extraction

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Domain adaptation of rule-based annotators for named-entity recognition tasks

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Automatic rule refinement for information extraction

Proceedings of the VLDB Endowment

Facilitating pattern discovery for relation extraction with semantic-signature-based clustering

Proceedings of the 20th ACM international conference on Information and knowledge management
Surfacing time-critical insights from social media

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
INDREX: in-database distributional relation extraction

Proceedings of the sixteenth international workshop on Data warehousing and OLAP

Quantified Score

Hi-index	0.00

Visualization

Abstract

Information Extraction (IE)-the problem of extracting structured information from unstructured text - has become the key enabler for many enterprise applications such as semantic search, business analytics and regulatory compliance. While rule-based IE systems are widely used in practice due to their well-known "explainability," developing high-quality information extraction rules is known to be a labor-intensive and time-consuming iterative process. Our demonstration showcases SystemT IDE, the integrated development environment for SystemT, a state-of-the-art rule-based IE system from IBMResearch that has been successfully embedded in multiple IBM enterprise products. SystemT IDE facilitates the development, test and analysis of high-quality IE rules by means of sophisticated techniques, ranging from data management to machine learning. We show how to build high-quality IE annotators using a suite of tools provided by SystemT IDE, including computing data provenance, learning basic features such as regular expressions and dictionaries, and automatically refining rules based on labeled examples.