Reducing uncertainty of schema matching via crowdsourcing

Authors:
Chen Jason Zhang;Lei Chen;H. V. Jagadish;Chen Caleb Cao
Affiliations:
Hong Kong University of Science and Technology, Hong Kong, China;Hong Kong University of Science and Technology, Hong Kong, China;University of Michigan, Ann Arbor, MI;Hong Kong University of Science and Technology, Hong Kong, China
Venue:
Proceedings of the VLDB Endowment
Year:
2013

Citing 19
Cited 0

The budgeted maximum coverage problem

Information Processing Letters
Schema Mapping as Query Discovery

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
A framework for modeling and evaluating automatic semantic reconciliation

The VLDB Journal — The International Journal on Very Large Data Bases
FICSR: feedback-based inconsistency resolution and query processing on misaligned data sources

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Translating web data

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Bootstrapping pay-as-you-go data integration systems

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Data integration with uncertainty

The VLDB Journal — The International Journal on Very Large Data Bases
Matching Schemas in Online Communities: A Web 2.0 Approach

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Aggregate Query Answering under Uncertain Schema Mappings

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Integrating and Ranking Uncertain Scientific Data

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
MayBMS: a probabilistic database management system

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Crowdsourcing systems on the World-Wide Web

Communications of the ACM
Human-assisted graph search: it's okay to ask questions

Proceedings of the VLDB Endowment
CrowdDB: answering queries with crowdsourcing

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Managing uncertainty in schema matching with top-k schema mappings

Journal on Data Semantics VI
Discovering Threshold-based Frequent Closed Itemsets over Probabilistic Data

ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
CrowdER: crowdsourcing entity resolution

Proceedings of the VLDB Endowment
Mining frequent itemsets over uncertain databases

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Schema matching is a central challenge for data integration systems. Automated tools are often uncertain about schema matchings they suggest, and this uncertainty is inherent since it arises from the inability of the schema to fully capture the semantics of the represented data. Human common sense can often help. Inspired by the popularity and the success of easily accessible crowdsourcing platforms, we explore the use of crowdsourcing to reduce the uncertainty of schema matching. Since it is typical to ask simple questions on crowdsourcing platforms, we assume that each question, namely Correspondence Correctness Question (CCQ), is to ask the crowd to decide whether a given correspondence should exist in the correct matching. We propose frameworks and efficient algorithms to dynamically manage the CCQs, in order to maximize the uncertainty reduction within a limited budget of questions. We develop two novel approaches, namely "Single CCQ" and "Multiple CCQ", which adaptively select, publish and manage the questions. We verified the value of our solutions with simulation and real implementation.