Efficiently incorporating user feedback into information extraction and integration programs

Authors:
Xiaoyong Chai;Ba-Quy Vuong;AnHai Doan;Jeffrey F. Naughton
Affiliations:
University of Wisconsin-Madison, Madison, WI, USA;University of Wisconsin-Madison, Madison, WI, USA;University of Wisconsin-Madison, Madison, WI, USA;University of Wisconsin-Madison, Madison, WI, USA
Venue:
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Year:
2009

Citing 28
Cited 11

Efficiently updating materialized views

SIGMOD '86 Proceedings of the 1986 ACM SIGMOD international conference on Management of data
Incremental maintenance of views with duplicates

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
On optimistic methods for concurrency control

ACM Transactions on Database Systems (TODS)
Efficient locking for concurrent operations on B-trees

ACM Transactions on Database Systems (TODS)
Reconciling schemas of disparate data sources: a machine-learning approach

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Lineage Tracing for General Data Warehouse Transformations

Proceedings of the 27th International Conference on Very Large Data Bases
An interactive clustering-based approach to integrating source query interfaces on the deep Web

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
UIMA: an architectural approach to unstructured information processing in the corporate research environment

Natural Language Engineering
The Lixto data extraction project: back and forth between theory and practice

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
From databases to dataspaces: a new abstraction for information management

ACM SIGMOD Record
To search or to crawl?: towards a query optimizer for text-centric tasks

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Incremental schema matching

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Provenance in databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Building structured web community portals: a top-down, compositional, and incremental approach

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Update exchange with mappings and provenance

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Declarative information extraction using datalog with embedded extraction predicates

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Interactive generation of integrated schemas

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Pay-as-you-go user feedback for dataspace systems

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Toward best-effort information extraction

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
On the provenance of non-answers to queries over extracted data

Proceedings of the VLDB Endowment
Information Extraction

Foundations and Trends in Databases
Information extraction challenges in managing unstructured data

ACM SIGMOD Record
Purple SOX extraction management system

ACM SIGMOD Record
The YAGO-NAGA approach to knowledge discovery

ACM SIGMOD Record
An Algebraic Approach to Rule-Based Information Extraction

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Efficient Information Extraction over Evolving Text Data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Building Community Wikipedias: A Machine-Human Partnership Approach

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Optimizing complex extraction programs over evolving text data

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data

Automatically incorporating new sources in keyword search-based data integration

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Crowdsourcing systems on the World-Wide Web

Communications of the ACM
Support for user involvement in data cleaning

DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
Incorporating user feedback into name disambiguation of scientific cooperation network

WAIM'11 Proceedings of the 12th international conference on Web-age information management
DSToolkit: an architecture for flexible dataspace management

Transactions on Large-Scale Data- and Knowledge-Centered Systems V
Human-machine cooperation with epistemological DBs: supporting user corrections to knowledge bases

AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
Deco: declarative crowdsourcing

Proceedings of the 21st ACM international conference on Information and knowledge management
Provenance-based dictionary refinement in information extraction

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Building, maintaining, and using knowledge bases: a report from the trenches

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Big data challenge: a data management perspective

Frontiers of Computer Science: Selected Publications from Chinese Universities
Name disambiguation in scientific cooperation network by exploiting user feedback

Artificial Intelligence Review

Quantified Score

Hi-index	0.02

Visualization

Abstract

Many applications increasingly employ information extraction and integration (IE/II) programs to infer structures from unstructured data. Automatic IE/II are inherently imprecise. Hence such programs often make many IE/II mistakes, and thus can significantly benefit from user feedback. Today, however, there is no good way to automatically provide and process such feedback. When finding an IE/II mistake, users often must alert the developer team (e.g., via email or Web form) about the mistake, and then wait for the team to manually examine the program internals to locate and fix the mistake, a slow, error-prone, and frustrating process. In this paper we propose a solution for users to directly provide feedback and for IE/II programs to automatically process such feedback. In our solution a developer U uses hlog, a declarative IE/II language, to write an IE/II program P. Next, U writes declarative user feedback rules that specify which parts of P's data (e.g., input, intermediate, or output data) users can edit, and via which user interfaces. Next, the so-augmented program P is executed, then enters a loop of waiting for and incorporating user feedback. Given user feedback F on a data portion of P, we show how to automatically propagate F to the rest of P, and to seamlessly combine F with prior user feedback. We describe the syntax and semantics of hlog, a baseline execution strategy, and then various optimization techniques. Finally, we describe experiments with real-world data that demonstrate the promise of our solution.