Enterprise information extraction: recent developments and open challenges

Authors:
Laura Chiticariu;Yunyao Li;Sriram Raghavan;Frederick R. Reiss
Affiliations:
IBM Research - Almaden, San Jose, CA, USA;IBM Research - Almaden, San Jose, CA, USA;IBM Research - Almaden, San Jose, CA, USA;IBM Research - Almaden, San Jose, CA, USA
Venue:
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Year:
2010

Citing 6
Cited 5

Managing information extraction: state of the art and research directions

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
SystemT: a system for declarative information extraction

ACM SIGMOD Record
Information extraction challenges in managing unstructured data

ACM SIGMOD Record
Purple SOX extraction management system

ACM SIGMOD Record
Building query optimizers for information extraction: the SQoUT project

ACM SIGMOD Record
RAD: A Scalable Framework for Annotator Development

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering

Service-oriented information extraction

Proceedings of the 2011 Joint EDBT/ICDT Ph.D. Workshop
The SystemT IDE: an integrated development environment for information extraction rules

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
SystemT: a declarative information extraction system

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Systems Demonstrations
Facilitating pattern discovery for relation extraction with semantic-signature-based clustering

Proceedings of the 20th ACM international conference on Information and knowledge management
WizIE: a best practices guided development environment for information extraction

ACL '12 Proceedings of the ACL 2012 System Demonstrations

Quantified Score

Hi-index	0.00

Visualization

Abstract

Information extraction (IE) - the problem of extracting structured information from unstructured text - has become an increasingly important topic in recent years. A SIGMOD 2006 tutorial [3] outlined challenges and opportunities for the database community to advance the state of the art in information extraction, and posed the following grand challenge: "Can we build a System R for information extraction? Our tutorial gives an overview of progress the database community has made towards meeting this challenge. In particular, we start by discussing design requirements in building an enterprise IE system. We then survey recent technological advances towards addressing these requirements, broadly categorized as: (1) Languages for specifying extraction programs in a declarative way, thus allowing database-style performance optimizations; (2) Infrastructure needed to ensure scalability, and (3) Development support for enterprise IE systems. Finally, we outline several open challenges and opportunities for the database community to further advance the state of the art in enterprise IE systems. The tutorial is intended for students and researchers interested in information extraction and its applications, and assumes no prior knowledge of the area.