Toward best-effort information extraction

Authors:
Warren Shen;Pedro DeRose;Robert McCann;AnHai Doan;Raghu Ramakrishnan
Affiliations:
University of Wisconsin, Madison, WI, USA;University of Wisconsin, Madison, WI, USA;Microsoft, Redmond, WA, USA;University of Wisconsin, Madison, WI, USA;Yahoo! Research, Santa Clara, CA, USA
Venue:
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Year:
2008

Citing 17
Cited 16

Interactive Data Analysis: The Control Project

Computer
UIMA: an architectural approach to unstructured information processing in the corporate research environment

Natural Language Engineering
The Lixto data extraction project: back and forth between theory and practice

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Answering queries from statistics and probabilistic views

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Working Models for Uncertain Data

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Managing information extraction: state of the art and research directions

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
ULDBs: databases with uncertainty and lineage

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Creating probabilistic databases from information extraction models

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
High-Performance Unsupervised Relation Extraction from Large Corpora

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Optimized stratified sampling for approximate query processing

ACM Transactions on Database Systems (TODS)
Building structured web community portals: a top-down, compositional, and incremental approach

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Declarative information extraction using datalog with embedded extraction predicates

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Efficient data integration: automation, collaboration, and relaxation

Efficient data integration: automation, collaboration, and relaxation
Pay-as-you-go user feedback for dataspace systems

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Fast and Simple Relational Processing of Uncertain Data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Enriching OWL with instance recognition semantics for automated semantic annotation

ER'07 Proceedings of the 2007 conference on Advances in conceptual modeling: foundations and applications
Automatic creation and simplified querying of semantic web content: an approach based on information-extraction ontologies

ASWC'06 Proceedings of the First Asian conference on The Semantic Web

On the provenance of non-answers to queries over extracted data

Proceedings of the VLDB Endowment
Information extraction challenges in managing unstructured data

ACM SIGMOD Record
Efficiently incorporating user feedback into information extraction and integration programs

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
FOCIH: Form-Based Ontology Creation and Information Harvesting

ER '09 Proceedings of the 28th International Conference on Conceptual Modeling
Automatically incorporating new sources in keyword search-based data integration

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Redundancy-driven web data extraction and integration

Procceedings of the 13th International Workshop on the Web and Databases
Automatic rule refinement for information extraction

Proceedings of the VLDB Endowment
Querying probabilistic information extraction

Proceedings of the VLDB Endowment
Self-supervised web search for any-k complete tuples

Proceedings of the 2nd International Workshop on Business intelligencE and the WEB
Hybrid in-database inference for declarative information extraction

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Building a generic debugger for information extraction pipelines

Proceedings of the 20th ACM international conference on Information and knowledge management
Chapter 6: web data extraction for service creation

Search Computing
Theoretical foundations for enabling a web of knowledge

FoIKS'10 Proceedings of the 6th international conference on Foundations of Information and Knowledge Systems
Proactive natural language search engine: tapping into structured data on the web

Proceedings of the 16th International Conference on Extending Database Technology
Provenance-based dictionary refinement in information extraction

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Extraction and integration of partially overlapping web sources

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current approaches to develop information extraction (IE) programs have largely focused on producing precise IE results. As such, they suffer from three major limitations. First, it is often difficult to execute partially specified IE programs and obtain meaningful results, thereby producing a long "debug loop". Second, it often takes a long time before we can obtain the first meaningful result (by finishing and running a precise IE program), thereby rendering these approaches impractical for time-sensitive IE applications. Finally, by trying to write precise IE programs we may also waste a significant amount of effort, because an approximate result -- one that can be produced quickly -- may already be satisfactory in many IE settings. To address these limitations, we propose iFlex, an IE approach that relaxes the precise IE requirement to enable best-effort IE. In iFlex, a developer U uses a declarative language to quickly write an initial approximate IE program P with a possible-worlds semantics. Then iFlex evaluates P using an approximate query processor to quickly extract an approximate result. Next, U examines the result, and further refines P if necessary, to obtain increasingly more precise results. To refine P, U can enlist a next-effort assistant, which suggests refinements based on the data and the current version of P. Extensive experiments on real-world domains demonstrate the utility of the iFlex approach.