PROUD: a probabilistic approach to processing similarity queries over uncertain data streams

Authors:
Mi-Yen Yeh;Kun-Lung Wu;Philip S. Yu;Ming-Syan Chen
Affiliations:
National Taiwan University;IBM T.J. Watson Research Center;University of Illinois at Chicago;National Taiwan University
Venue:
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Year:
2009

Citing 27
Cited 10

Wavelet-based histograms for selectivity estimation

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
One-Pass Wavelet Decompositions of Data Streams

IEEE Transactions on Knowledge and Data Engineering
Evaluating probabilistic queries over imprecise data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Similarity Search Over Time-Series Data Using Wavelets

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Wavelet synopsis for data streams: minimizing non-euclidean error

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
One-pass wavelet synopses for maximum-error metrics

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Indexing multi-dimensional uncertain data with arbitrary probability density functions

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Working Models for Uncertain Data

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
The Gauss-Tree: Efficient Object Identification in Databases of Probabilistic Feature Vectors

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
ULDBs: databases with uncertainty and lineage

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Efficient range-constrained similarity search on wavelet synopses over multiple streams

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Sketching probabilistic data streams

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Estimating statistical aggregates on probabilistic data streams

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
StatStream: statistical monitoring of thousands of data streams in real time

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Efficient query evaluation on probabilistic databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Efficient indexing methods for probabilistic threshold queries over uncertain data

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Probabilistic skylines on uncertain data

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
LeeWave: level-wise distribution of wavelet coefficients for processing kNN queries over distributed streams

Proceedings of the VLDB Endowment
Efficiently Answering Probabilistic Threshold Top-k Queries on Uncertain Data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Efficient Processing of Top-k Queries in Uncertain Databases

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
On High Dimensional Indexing of Uncertain Data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Pattern Matching over Cloaked Time Series

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
On Unifying Privacy and Uncertain Data Models

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Probabilistic Verifiers: Evaluating Constrained Nearest-Neighbor Queries over Uncertain Data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Fast and Simple Relational Processing of Uncertain Data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Exploiting Lineage for Confidence Computation in Uncertain and Probabilistic Databases

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Fast approximate wavelet tracking on streams

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology

Efficient join processing on uncertain data streams

Proceedings of the 18th ACM conference on Information and knowledge management
DUST: a generalized notion of similarity between uncertain time series

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
DUST: a generalized notion of similarity between uncertain time series

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Top-k similarity search on uncertain trajectories

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Similarity matching for uncertain time series: analytical and experimental comparison

Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Querying and Mining Uncertain Spatio-Temporal Data
Managing uncertain spatio-temporal data

Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Querying and Mining Uncertain Spatio-Temporal Data
Uncertain time-series similarity: return to the basics

Proceedings of the VLDB Endowment
Indexing uncertain spatio-temporal data

Proceedings of the 21st ACM international conference on Information and knowledge management
Probabilistic distance based abnormal pattern detection in uncertain series data

Knowledge-Based Systems
A probabilistic approach to correlation queries in uncertain time series data

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present PROUD -- A PRObabilistic approach to processing similarity queries over Uncertain Data streams, where the data streams here are mainly time series streams. In contrast to data with certainty, an uncertain series is an ordered sequence of random variables. The distance between two uncertain series is also a random variable. We use a general uncertain data model, where only the mean and the deviation of each random variable at each timestamp are available. We derive mathematical conditions for progressively pruning candidates to reduce the computation cost. We then apply PROUD to a streaming environment where only sketches of streams, like wavelet synopses, are available. Extensive experiments are conducted to evaluate the effectiveness of PROUD and compare it with Det, a deterministic approach that directly processes data without considering uncertainty. The results show that, compared with Det, PROUD offers a flexible trade-off between false positives and false negatives by controlling a threshold, while maintaining a similar computation cost. In contrast, Det does not provide such flexibility. This trade-off is important as in some applications false negatives are more costly, while in others, it is more critical to keep the false positives low.