Conditioning and aggregating uncertain data streams: going beyond expectations

Authors:
Thanh T. L. Tran;Andrew McGregor;Yanlei Diao;Liping Peng;Anna Liu
Affiliations:
University of Massachusetts, Amherst;University of Massachusetts, Amherst;University of Massachusetts, Amherst;University of Massachusetts, Amherst;University of Massachusetts, Amherst
Venue:
Proceedings of the VLDB Endowment
Year:
2010

Citing 14
Cited 1

ULDBs: databases with uncertainty and lineage

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Sketching probabilistic data streams

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Efficient aggregation algorithms for probabilistic data

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Efficient query evaluation on probabilistic databases

The VLDB Journal — The International Journal on Very Large Data Bases
Model-driven data acquisition in sensor networks

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
MCDB: a monte carlo approach to managing uncertain data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Estimating statistical aggregates on probabilistic data streams

ACM Transactions on Database Systems (TODS)
Database Support for Probabilistic Attributes and Tuples

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Handling Uncertain Data in Array Database Systems

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Online Filtering, Smoothing and Probabilistic Modeling of Streaming data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Probabilistic Inference over RFID Streams in Mobile Environments

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
The trichotomy of HAVING queries on a probabilistic database

The VLDB Journal — The International Journal on Very Large Data Bases
PODS: a new model and processing algorithms for uncertain data streams

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
An end-user-responsive sensor network architecture for hazardous weather detection, prediction and response

AINTEC'06 Proceedings of the Second Asian international conference on Technologies for Advanced Heterogeneous Networks

CLARO: modeling and processing uncertain data streams

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Uncertain data streams are increasingly common in real-world deployments and monitoring applications require the evaluation of complex queries on such streams. In this paper, we consider complex queries involving conditioning (e.g., selections and group by's) and aggregation operations on uncertain data streams. To characterize the uncertainty of answers to these queries, one generally has to compute the full probability distribution of each operation used in the query. Computing distributions of aggregates given conditioned tuple distributions is a hard, unsolved problem. Our work employs a new evaluation framework that includes a general data model, approximation metrics, and approximate representations. Within this framework we design fast data-stream algorithms, both deterministic and randomized, for returning approximate distributions with bounded errors as answers to those complex queries. Our experimental results demonstrate the accuracy and efficiency of our approximation techniques and offer insights into the strengths and limitations of deterministic and randomized algorithms.