Least expected cost query optimization: an exercise in utility
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Proceedings of the 6th international conference on Intelligent user interfaces
Reconciling schemas of disparate data sources: a machine-learning approach
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Information Retrieval
Models of attention in computing and communication: from principles to applications
Communications of the ACM
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Artificial Intelligence: A Modern Approach
Artificial Intelligence: A Modern Approach
Interactive deduplication using active learning
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Labeling images with a computer game
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
An interactive clustering-based approach to integrating source query interfaces on the deep Web
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Effective use of block-level sampling in statistics estimation
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Accurately interpreting clickthrough data as implicit feedback
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Query chains: learning to rank from implicit feedback
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Semantic-integration research in the database community
AI Magazine - Special issue on semantic integration
The Wisdom of Crowds
Proceedings of the 3rd international conference on Embedded networked sensor systems
The history of histograms (abridged)
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Bootstrapping pay-as-you-go data integration systems
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Selective supervision: guiding supervised learning with decision-theoretic active learning
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Bootstrapping pay-as-you-go data integration systems
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Toward best-effort information extraction
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
The VLDB Journal — The International Journal on Very Large Data Bases
A first tutorial on dataspaces
Proceedings of the VLDB Endowment
Efficiently incorporating user feedback into information extraction and integration programs
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
BNCOD 26 Proceedings of the 26th British National Conference on Databases: Dataspace: The Final Frontier
Data Modeling in Dataspace Support Platforms
Conceptual Modeling: Foundations and Applications
Feedback-based annotation, selection and refinement of schema mappings for dataspaces
Proceedings of the 13th International Conference on Extending Database Technology
Flexible Dataspace Management Through Model Management
Proceedings of the 2010 EDBT/ICDT Workshops
Automatically incorporating new sources in keyword search-based data integration
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Crowds, clouds, and algorithms: exploring the human side of "big data" applications
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Relational processing of RDF queries: a survey
ACM SIGMOD Record
Towards large-scale scientific dataspaces for e-science applications
DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
An experimental evaluation of relational RDF storage and querying techniques
DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
Human-assisted graph search: it's okay to ask questions
Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment
Potential role based entity matching for dataspaces search
WISE'10 Proceedings of the 11th international conference on Web information systems engineering
RELIN: relatedness and informativeness-based centrality for entity summarization
ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part I
Search Computing
DSToolkit: an architecture for flexible dataspace management
Transactions on Large-Scale Data- and Knowledge-Centered Systems V
Data quality and integration in collaborative environments
PhD '12 Proceedings of the on SIGMOD/PODS 2012 PhD Symposium
Leveraging matching dependencies for guided user feedback in linked data applications
Proceedings of the Ninth International Workshop on Information Integration on the Web
CrowdER: crowdsourcing entity resolution
Proceedings of the VLDB Endowment
Pay-as-You-Go ranking of schema mappings using query logs
DILS'12 Proceedings of the 8th international conference on Data Integration in the Life Sciences
Proceedings of the 3rd Annual ACM Web Science Conference
3SEPIAS: A Semi-Structured Search Engine for Personal Information in dAtaspace System
Information Sciences: an International Journal
Identifying and weighting integration hypotheses on open data platforms
Proceedings of the First International Workshop on Open Data
Indexing dataspaces with partitions
World Wide Web
Actively soliciting feedback for query answers in keyword search-based data integration
Proceedings of the VLDB Endowment
Incrementally improving dataspaces based on user feedback
Information Systems
Comparable dependencies over heterogeneous data
The VLDB Journal — The International Journal on Very Large Data Bases
Leveraging transitive relations for crowdsourced joins
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Big data challenge: a data management perspective
Frontiers of Computer Science: Selected Publications from Chinese Universities
A data cleaning framework based on user feedback
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Hi-index | 0.00 |
A primary challenge to large-scale data integration is creating semantic equivalences between elements from different data sources that correspond to the same real-world entity or concept. Dataspaces propose a pay-as-you-go approach: automated mechanisms such as schema matching and reference reconciliation provide initial correspondences, termed candidate matches, and then user feedback is used to incrementally confirm these matches. The key to this approach is to determine in what order to solicit user feedback for confirming candidate matches. In this paper, we develop a decision-theoretic framework for ordering candidate matches for user confirmation using the concept of the value of perfect information (VPI). At the core of this concept is a utility function that quantifies the desirability of a given state; thus, we devise a utility function for dataspaces based on query result quality. We show in practice how to efficiently apply VPI in concert with this utility function to order user confirmations. A detailed experimental evaluation on both real and synthetic datasets shows that the ordering of user feedback produced by this VPI-based approach yields a dataspace with a significantly higher utility than a wide range of other ordering strategies. Finally, we outline the design of Roomba, a system that utilizes this decision-theoretic framework to guide a dataspace in soliciting user feedback in a pay-as-you-go manner.