Efficient indexing methods for probabilistic threshold queries over uncertain data

Authors:
Reynold Cheng;Yuni Xia;Sunil Prabhakar;Rahul Shah;Jeffrey Scott Vitter
Affiliations:
Department of Computer Sciences, Purdue University, West Lafayette, IN;Department of Computer Sciences, Purdue University, West Lafayette, IN;Department of Computer Sciences, Purdue University, West Lafayette, IN;Department of Computer Sciences, Purdue University, West Lafayette, IN;Department of Computer Sciences, Purdue University, West Lafayette, IN
Venue:
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Year:
2004

Citing 12
Cited 81

Geometric range searching

ACM Computing Surveys (CSUR)
Indexing for data models with constraints and classes

Journal of Computer and System Sciences
Processing queries by linear constraints

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
On two-dimensional indexability and optimal range search indexing

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Uncertainty in a nested relational database model

Data & Knowledge Engineering
Updating and Querying Databases that Track Mobile Units

Distributed and Parallel Databases - Special issue on mobile data management and applications
Indexing the positions of continuously moving objects

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Managing Intervals Efficiently in Object-Relational Databases

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Evaluating probabilistic queries over imprecise data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Optimal dynamic interval management in external memory

FOCS '96 Proceedings of the 37th Annual Symposium on Foundations of Computer Science
Managing uncertainty in sensor database

ACM SIGMOD Record
Querying Imprecise Data in Moving Object Environments

IEEE Transactions on Knowledge and Data Engineering

Indexing continuously changing data with mean-variance tree

Proceedings of the 2005 ACM symposium on Applied computing
Indexing multi-dimensional uncertain data with arbitrary probability density functions

VLDB '05 Proceedings of the 31st international conference on Very large data bases
The Indiana Center for Database Systems at Purdue University

ACM SIGMOD Record
Delay aware querying with seaweed

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Modeling and querying uncertain spatial information for situational awareness applications

GIS '06 Proceedings of the 14th annual ACM international symposium on Advances in geographic information systems
Index for fast retrieval of uncertain spatial point data

GIS '06 Proceedings of the 14th annual ACM international symposium on Advances in geographic information systems
Efficient join processing over uncertain data

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Evaluation of probabilistic queries over imprecise data in constantly-evolving environments

Information Systems
Range search on multidimensional uncertain data

ACM Transactions on Database Systems (TODS)
Probabilistic skylines on uncertain data

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Probabilistic ranked queries in uncertain databases

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Evaluating uncertain location-based spatial queries

Proceedings of the 2008 ACM symposium on Applied computing
Monochromatic and bichromatic reverse skyline search over uncertain databases

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Ranking queries on uncertain data: a probabilistic threshold approach

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Orion 2.0: native support for uncertain data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Query answering techniques on uncertain and probabilistic data: tutorial summary

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Algorithms and data structures for external memory

Foundations and Trends® in Theoretical Computer Science
Query Selectivity Estimation for Uncertain Data

SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Quality-Aware Probing of Uncertain Data with Resource Constraints

SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
ProUD: Probabilistic Ranking in Uncertain Databases

SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
QUESTO: A Query Language for Uncertain and Exact Spatio-temporal Objects

ADBIS '08 Proceedings of the 12th East European conference on Advances in Databases and Information Systems
Cleaning uncertain data with quality guarantees

Proceedings of the VLDB Endowment
Indexing continuously changing data with mean-variance tree

International Journal of High Performance Computing and Networking
Processing probabilistic spatio-temporal range queries over moving objects with uncertainty

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
PROUD: a probabilistic approach to processing similarity queries over uncertain data streams

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Anytime measures for top-k algorithms on exact and fuzzy data sets

The VLDB Journal — The International Journal on Very Large Data Bases
Hot Item Detection in Uncertain Data

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Managing uncertainty in location-based queries

Fuzzy Sets and Systems
Efficient processing of probabilistic reverse nearest neighbor queries over uncertain data

The VLDB Journal — The International Journal on Very Large Data Bases
Indexing uncertain data

Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Computing all skyline probabilities for uncertain data

Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Indexing correlated probabilistic databases

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Probabilistic Similarity Search for Uncertain Time Series

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Continuously monitoring top-k uncertain data streams: a probabilistic threshold method

Distributed and Parallel Databases
Spatiotemporal Access Control Enforcement under Uncertain Location Estimates

Proceedings of the 23rd Annual IFIP WG 11.3 Working Conference on Data and Applications Security XXIII
Fast likelihood search for hidden Markov models

ACM Transactions on Knowledge Discovery from Data (TKDD)
Uncertainty Reduction in Location-Based Retrieval of Georeferenced Web Resources by Moving Users

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Probabilistic skyline queries

Proceedings of the 18th ACM conference on Information and knowledge management
Location-dependent query processing: Where we are and where we are heading

ACM Computing Surveys (CSUR)
UR-tree: an efficient index for uncertain data in ubiquitous sensor networks

GPC'07 Proceedings of the 2nd international conference on Advances in grid and pervasive computing
Threshold-based probabilistic top-k dominating queries

The VLDB Journal — The International Journal on Very Large Data Bases
Querying objects modeled by arbitrary probability distributions

SSTD'07 Proceedings of the 10th international conference on Advances in spatial and temporal databases
Efficient processing of 3-sided range queries with probabilistic guarantees

Proceedings of the 13th International Conference on Database Theory
Threshold query optimization for uncertain data

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Querying and cleaning uncertain data

QuaCon'09 Proceedings of the 1st international conference on Quality of context
A probabilistic filter protocol for continuous queries

QuaCon'09 Proceedings of the 1st international conference on Quality of context
Point distance and orthogonal range problems with dependent geometric uncertainties

Proceedings of the 14th ACM Symposium on Solid and Physical Modeling
Uncertain geometry with dependencies

Proceedings of the 14th ACM Symposium on Solid and Physical Modeling
Selective data acquisition for probabilistic K-NN query

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Identifying interesting instances for probabilistic skylines

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
Finding the least influenced set in uncertain databases

Information Systems
Combining intensional with extensional query evaluation in tuple independent probabilistic databases

Information Sciences: an International Journal
UPI: a primary index for uncertain databases

Proceedings of the VLDB Endowment
Similarity search and mining in uncertain databases

Proceedings of the VLDB Endowment
Probabilistic inverse ranking queries in uncertain databases

The VLDB Journal — The International Journal on Very Large Data Bases
Ranking queries on uncertain data

The VLDB Journal — The International Journal on Very Large Data Bases
Asymptotically efficient algorithms for skyline probabilities of uncertain data

ACM Transactions on Database Systems (TODS)
Ranking uncertain sky: The probabilistic top-k skyline operator

Information Systems
A novel tumor grading technique using functional magnetic resonance imaging

Proceedings of the 2011 workshop on Data mining for medicine and healthcare
Continuous probabilistic count queries in wireless sensor networks

SSTD'11 Proceedings of the 12th international conference on Advances in spatial and temporal databases
Shooting top-k stars in uncertain databases

The VLDB Journal — The International Journal on Very Large Data Bases
Uncertain data mining: an example in clustering location data

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
SAT: spatial awareness from textual input

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Probabilistic skylines on uncertain data: model and bounding-pruning-refining methods

Journal of Intelligent Information Systems
Efficiently answering probability threshold-based shortest path queries over uncertain graphs

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Probabilistic spatial queries on existentially uncertain data

SSTD'05 Proceedings of the 9th international conference on Advances in Spatial and Temporal Databases
Efficient processing of probabilistic set-containment queries on uncertain set-valued data

Information Sciences: an International Journal
MUD: Mapping-based query processing for high-dimensional uncertain data

Information Sciences: an International Journal
Top-$\boldsymbol{k}$ query processing over uncertain data in distributed environments

World Wide Web
DuoWave: Mitigating the curse of dimensionality for uncertain data

Data & Knowledge Engineering
Effectively indexing the multi-dimensional uncertain objects for range searching

Proceedings of the 15th International Conference on Extending Database Technology
An efficient method for cleaning dirty-events over uncertain data in WSNs

Journal of Computer Science and Technology - Special issue on Natural Language Processing
Range searching on uncertain data

ACM Transactions on Algorithms (TALG)
Probabilistic range monitoring of streaming uncertain positions in geosocial networks

SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Hierarchical querying scheme of human motions for smart home environment

Engineering Applications of Artificial Intelligence
Technical note: Uncertain lines and circles with dependencies

Computer-Aided Design
HUGVid: handling, indexing and querying of uncertain geo-tagged videos

Proceedings of the 20th International Conference on Advances in Geographic Information Systems
Nearest Neighbor-Based Classification of Uncertain Data

ACM Transactions on Knowledge Discovery from Data (TKDD)
Range counting coresets for uncertain data

Proceedings of the twenty-ninth annual symposium on Computational geometry
Efficient and scalable monitoring and summarization of large probabilistic data

Proceedings of the 2013 Sigmod/PODS Ph.D. symposium on PhD symposium
UV-diagram: a voronoi diagram for uncertain spatial databases

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is infeasible for a sensor database to contain the exact value of each sensor at all points in time. This uncertainty is inherent in these systems due to measurement and sampling errors, and resource limitations. In order to avoid drawing erroneous conclusions based upon stale data, the use of uncertainty intervals that model each data item as a range and associated probability density function (pdf) rather than a single value has recently been proposed. Querying these uncertain data introduces imprecision into answers, in the form of probability values that specify the likeliness the answer satisfies the query. These queries are more expensive to evaluate than their traditional counterparts but are guaranteed to be correct and more informative due to the probabilities accompanying the answers. Although the answer probabilities are useful, for many applications, it is only necessary to know whether the probability exceeds a given threshold - we term these Probabilistic Threshold Queries (PTQ). In this paper we address the efficient computation of these types of queries. In particular, we develop two index structures and associated algorithms to efficiently answer PTQs. The first index scheme is based on the idea of augmenting uncertainty information to an R-tree. We establish the difficulty of this problem by mapping one-dimensional intervals to a two-dimensional space, and show that the problem of interval indexing with probabilities is significantly harder than interval indexing which is considered a well-studied problem. To overcome the limitations of this R-tree based structure, we apply a technique we call variance-based clustering, where data points with similar degrees of uncertainty are clustered together. Our extensive index structure can answer the queries for various kinds of uncertainty pdfs, in an almost optimal sense. We conduct experiments to validate the superior performance of both indexing schemes.