Managing information extraction: state of the art and research directions

Authors:
AnHai Doan;Raghu Ramakrishnan;Shivakumar Vaithyanathan
Affiliations:
University of Illinois;University of Wisconsin;IBM Research at Almaden
Venue:
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Year:
2006

Citing 0
Cited 25

One platform for mining structured and unstructured data: dream or reality?

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Example-driven design of efficient record matching queries

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
EntityRank: searching entities directly and holistically

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Building structured web community portals: a top-down, compositional, and incremental approach

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Declarative information extraction using datalog with embedded extraction predicates

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A relational approach to incrementally extracting and querying structure in unstructured data

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Toward best-effort information extraction

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
On the provenance of non-answers to queries over extracted data

Proceedings of the VLDB Endowment
High-performance information extraction with AliBaba

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
SystemT: a system for declarative information extraction

ACM SIGMOD Record
Purple SOX extraction management system

ACM SIGMOD Record
A web of concepts

Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Optimizing complex extraction programs over evolving text data

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Digital weight watching: reconstruction of scanned documents

Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
RankIE: document retrieval on ranked entity graphs

Proceedings of the VLDB Endowment
Enterprise information extraction: recent developments and open challenges

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Focused retrieval and result aggregation with political data

Information Retrieval
Querying probabilistic information extraction

Proceedings of the VLDB Endowment
Attribute domain discovery for hidden web databases

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
SystemT: a declarative information extraction system

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Systems Demonstrations
Facilitating pattern discovery for relation extraction with semantic-signature-based clustering

Proceedings of the 20th ACM international conference on Information and knowledge management
Beauty and the beast: the theory and practice of information integration

ICDT'07 Proceedings of the 11th international conference on Database Theory
Knowledge harvesting in the big-data era

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Information extraction as a filtering task

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Towards generic framework for tabular data extraction and management in documents

Proceedings of the sixth workshop on Ph.D. students in information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

This tutorial makes the case for developing a unified framework that manages information extraction from unstructured data (focusing in particular on text). We first survey research on information extraction in the database, AI, NLP, IR, and Web communities in recent years. Then we discuss why this is the right time for the database community to actively participate and address the problem of managing information extraction (including in particular the challenges of maintaining and querying the extracted information, and accounting for the imprecision and uncertainty inherent in the extraction process). Finally, we show how interested researchers can take the next step, by pointing to open problems, available datasets, applicable standards, and software tools. We do not assume prior knowledge of text management, NLP, extraction techniques, or machine learning.