Global optimization of histograms
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Range Selectivity Estimation for Continuous Attributes
SSDBM '99 Proceedings of the 11th International Conference on Scientific and Statistical Database Management
ItCompress: An Iterative Semantic Compression Algorithm
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Indexing spatio-temporal trajectories with Chebyshev polynomials
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Deterministic wavelet thresholding for maximum-error metrics
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Wavelet synopsis for data streams: minimizing non-euclidean error
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
ULDBs: databases with uncertainty and lineage
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Efficient join processing over uncertain data
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Management of probabilistic data: foundations and challenges
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Exploiting duality in summarization with deterministic guarantees
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient query evaluation on probabilistic databases
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
MCDB: a monte carlo approach to managing uncertain data
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Proceedings of the VLDB Endowment
Database Support for Probabilistic Attributes and Tuples
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Histograms and Wavelets on Probabilistic Data
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Probabilistic histograms for probabilistic data
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Many real world applications produce data with uncertainties drawn from measurements over a continuous domain space. Recent research in the area of probabilistic databases has mainly focused on managing and querying discrete data in which the domain is limited to a small number of values (i.e. on the order of 10). When the size of the domain increases, current methods fail due to their nature of explicitly storing each value/probability pair. Such methods are not capable of extending their use to continuous-valued attributes. In this paper, we provide a scalable, accurate, space efficient probabilistic data synopsis for uncertain attributes defined over a continuous domain. Our synopsis construction methods are all error-aware to ensure that our synopsis provides an accurate representation of the underlying data given a limited space budget. Additionally, we are able to provide approximate query results over the synopsis with error bounds. We provide an extensive experimental evaluation to show that our proposed methods improve upon the current state of the art in terms of construction time and query accuracy. In particular, our synopsis can be constructed in O(N2) time (where N is the number of tuples in the database). We also demonstrate the ability of our synopsis to answer a variety of interesting queries on a real data set and show that our query error is reduced by up to an order of magnitude over the previous state-of-the-art method.