Aggregate Query Answering under Uncertain Schema Mappings

Authors:
Avigdor Gal;Maria Vanina Martinez;Gerardo I. Simari;V. S. Subrahmanian
Affiliations:
-;-;-;-
Venue:
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Year:
2009

Citing 0
Cited 8

A Survey on Uncertainty Management in Data Integration

Journal of Data and Information Quality (JDIQ)
Dealing with matching variability of semantic web data using contexts

CAiSE'10 Proceedings of the 22nd international conference on Advanced information systems engineering
DCUBE: CUBE on dirty databases

WAIM'10 Proceedings of the 11th international conference on Web-age information management
Enhancing accuracy and expressive power of range query answers over incomplete spatial databases via a novel reasoning approach

Data & Knowledge Engineering
Efficient management of uncertainty in XML schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
Non-binary evaluation for schema matching

ER'12 Proceedings of the 31st international conference on Conceptual Modeling
Incrementally improving dataspaces based on user feedback

Information Systems
Reducing uncertainty of schema matching via crowdsourcing

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent interest in managing uncertainty in data integration has led to the introduction of probabilistic schema mappings and the use of probabilistic methods to answer queries across multiple databases using two semantics: by-table and by-tuple. In this paper, we develop three possible semantics for aggregate queries: the range, distribution, and expected value semantics, and show that these three semantics combine with the by-table and by-tuple semantics in six ways. We present algorithms to process COUNT, AVG, SUM, MIN, and MAX queries under all six semantics and develop results on the complexity of processing such queries under all six semantics. We show that computing COUNT is in PTIME for all six semantics and computing SUM is in PTIME for all but the by-tuple/distribution semantics. Finally, we show that AVG, MIN, and MAX are PTIME computable for all by-table semantics and for the by-tuple/range semantics.We developed a prototype implementation and experimented with both real-world traces and simulated data. We show that, as expected, naive processing of aggregates does not scale beyond small databases with a small number of mappings. The results also show that the polynomial time algorithms are scalable up to several million tuples as well as with a large number of mappings.