Efficiently updating materialized views
SIGMOD '86 Proceedings of the 1986 ACM SIGMOD international conference on Management of data
Incremental maintenance of views with duplicates
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
On optimistic methods for concurrency control
ACM Transactions on Database Systems (TODS)
Efficient locking for concurrent operations on B-trees
ACM Transactions on Database Systems (TODS)
Reconciling schemas of disparate data sources: a machine-learning approach
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Lineage Tracing for General Data Warehouse Transformations
Proceedings of the 27th International Conference on Very Large Data Bases
An interactive clustering-based approach to integrating source query interfaces on the deep Web
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Natural Language Engineering
The Lixto data extraction project: back and forth between theory and practice
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
To search or to crawl?: towards a query optimizer for text-centric tasks
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Building structured web community portals: a top-down, compositional, and incremental approach
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Update exchange with mappings and provenance
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Declarative information extraction using datalog with embedded extraction predicates
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Interactive generation of integrated schemas
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Pay-as-you-go user feedback for dataspace systems
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Toward best-effort information extraction
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
On the provenance of non-answers to queries over extracted data
Proceedings of the VLDB Endowment
Foundations and Trends in Databases
Information extraction challenges in managing unstructured data
ACM SIGMOD Record
Purple SOX extraction management system
ACM SIGMOD Record
The YAGO-NAGA approach to knowledge discovery
ACM SIGMOD Record
An Algebraic Approach to Rule-Based Information Extraction
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Efficient Information Extraction over Evolving Text Data
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Building Community Wikipedias: A Machine-Human Partnership Approach
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Optimizing complex extraction programs over evolving text data
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Automatically incorporating new sources in keyword search-based data integration
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Crowdsourcing systems on the World-Wide Web
Communications of the ACM
Support for user involvement in data cleaning
DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
Incorporating user feedback into name disambiguation of scientific cooperation network
WAIM'11 Proceedings of the 12th international conference on Web-age information management
DSToolkit: an architecture for flexible dataspace management
Transactions on Large-Scale Data- and Knowledge-Centered Systems V
Human-machine cooperation with epistemological DBs: supporting user corrections to knowledge bases
AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
Deco: declarative crowdsourcing
Proceedings of the 21st ACM international conference on Information and knowledge management
Provenance-based dictionary refinement in information extraction
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Building, maintaining, and using knowledge bases: a report from the trenches
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Big data challenge: a data management perspective
Frontiers of Computer Science: Selected Publications from Chinese Universities
Name disambiguation in scientific cooperation network by exploiting user feedback
Artificial Intelligence Review
Hi-index | 0.02 |
Many applications increasingly employ information extraction and integration (IE/II) programs to infer structures from unstructured data. Automatic IE/II are inherently imprecise. Hence such programs often make many IE/II mistakes, and thus can significantly benefit from user feedback. Today, however, there is no good way to automatically provide and process such feedback. When finding an IE/II mistake, users often must alert the developer team (e.g., via email or Web form) about the mistake, and then wait for the team to manually examine the program internals to locate and fix the mistake, a slow, error-prone, and frustrating process. In this paper we propose a solution for users to directly provide feedback and for IE/II programs to automatically process such feedback. In our solution a developer U uses hlog, a declarative IE/II language, to write an IE/II program P. Next, U writes declarative user feedback rules that specify which parts of P's data (e.g., input, intermediate, or output data) users can edit, and via which user interfaces. Next, the so-augmented program P is executed, then enters a loop of waiting for and incorporating user feedback. Given user feedback F on a data portion of P, we show how to automatically propagate F to the rest of P, and to seamlessly combine F with prior user feedback. We describe the syntax and semantics of hlog, a baseline execution strategy, and then various optimization techniques. Finally, we describe experiments with real-world data that demonstrate the promise of our solution.