Probabilistic similarity join on uncertain data

Authors:
Hans-Peter Kriegel;Peter Kunath;Martin Pfeifle;Matthias Renz
Affiliations:
University of Munich, Germany;University of Munich, Germany;University of Munich, Germany;University of Munich, Germany
Venue:
DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
Year:
2006

Citing 24
Cited 41

Efficient processing of spatial joins using R-trees

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Management of uncertainty in database systems

Modern database systems
Spatial joins using seeded trees

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Spatial hash-joins

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Partition based spatial-merge join

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Size separation spatial join

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Optimal multi-step k-nearest neighbor search

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Updating and Querying Databases that Track Mobile Units

Distributed and Parallel Databases - Special issue on mobile data management and applications
Foundations of Databases: The Logical Level

Foundations of Databases: The Logical Level
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
High-Dimensional Similarity Joins

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
High Dimensional Similarity Joins: Algorithms and Performance Evaluation

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Spatial Joins Using R-trees: Breadth-First Traversal with Global Optimizations

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
A Generic Approach to Bulk Loading Multidimensional Index Structures

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Hilbert R-tree: An Improved R-tree using Fractals

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
3D Shape Histograms for Similarity Search and Classification in Spatial Databases

SSD '99 Proceedings of the 6th International Symposium on Advances in Spatial Databases
Evaluating probabilistic queries over imprecise data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Using sets of feature vectors for similarity search on voxelized CAD objects

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Face recognition: A literature survey

ACM Computing Surveys (CSUR)
Clustering objects on a spatial network

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Querying Imprecise Data in Moving Object Environments

IEEE Transactions on Knowledge and Data Engineering
Scalable density-based distributed clustering

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Approximated clustering of distributed high-dimensional data

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Probabilistic spatial queries on existentially uncertain data

SSTD'05 Proceedings of the 9th international conference on Advances in Spatial and Temporal Databases

Range search on multidimensional uncertain data

ACM Transactions on Database Systems (TODS)
Probabilistic skylines on uncertain data

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Probabilistic ranked queries in uncertain databases

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Monochromatic and bichromatic reverse skyline search over uncertain databases

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
ProUD: Probabilistic Ranking in Uncertain Databases

SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Top-k dominating queries in uncertain databases

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Probabilistic Inverse Ranking Queries over Uncertain Data

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Techniques for Efficiently Searching in Spatial, Temporal, Spatio-temporal, and Multimedia Databases

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Probabilistic Threshold Range Aggregate Query Processing over Uncertain Data

APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
Hot Item Detection in Uncertain Data

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Efficient processing of probabilistic reverse nearest neighbor queries over uncertain data

The VLDB Journal — The International Journal on Very Large Data Bases
Probabilistic frequent itemset mining in uncertain databases

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic Similarity Search for Uncertain Time Series

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Efficient join processing on uncertain data streams

Proceedings of the 18th ACM conference on Information and knowledge management
Reverse skyline search in uncertain databases

ACM Transactions on Database Systems (TODS)
Threshold-based probabilistic top-k dominating queries

The VLDB Journal — The International Journal on Very Large Data Bases
Probabilistic nearest-neighbor query on uncertain objects

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Probabilistic string similarity joins

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Development of foundation models for Internet of Things

Frontiers of Computer Science in China
A generic framework for handling uncertain data with local correlations

Proceedings of the VLDB Endowment
Finding the least influenced set in uncertain databases

Information Systems
Metric and trigonometric pruning for clustering of uncertain data in 2D geometric space

Information Systems
Set similarity join on probabilistic data

Proceedings of the VLDB Endowment
Similarity search and mining in uncertain databases

Proceedings of the VLDB Endowment
Probabilistic inverse ranking queries in uncertain databases

The VLDB Journal — The International Journal on Very Large Data Bases
Ranking uncertain sky: The probabilistic top-k skyline operator

Information Systems
Top-K probabilistic closest pairs query in uncertain spatial databases

APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
Continuous probabilistic count queries in wireless sensor networks

SSTD'11 Proceedings of the 12th international conference on Advances in spatial and temporal databases
Evaluating probabilistic spatial-range closest pairs queries over uncertain objects

WAIM'11 Proceedings of the 12th international conference on Web-age information management
Shooting top-k stars in uncertain databases

The VLDB Journal — The International Journal on Very Large Data Bases
Probabilistic skylines on uncertain data: model and bounding-pruning-refining methods

Journal of Intelligent Information Systems
Efficient processing of probabilistic set-containment queries on uncertain set-valued data

Information Sciences: an International Journal
Effectively indexing the multi-dimensional uncertain objects for range searching

Proceedings of the 15th International Conference on Extending Database Technology
Top-k similarity join over multi-valued objects

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
AN EFFICIENT REPRESENTATION MODEL OF DISTANCE DISTRIBUTION BETWEEN UNCERTAIN OBJECTS

Computational Intelligence
Probabilistic range monitoring of streaming uncertain positions in geosocial networks

SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Probabilistic frequent pattern growth for itemset mining in uncertain databases

SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Probabilistic top-k dominating queries in uncertain databases

Information Sciences: an International Journal
HUGVid: handling, indexing and querying of uncertain geo-tagged videos

Proceedings of the 20th International Conference on Advances in Geographic Information Systems
Efficient processing of probabilistic group subspace skyline queries in uncertain databases

Information Systems
Efficient top-k similarity join processing over multi-valued objects

World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

An important database primitive for commonly used feature databases is the similarity join. It combines two datasets based on some similarity predicate into one set such that the new set contains pairs of objects of the two original sets. In many different application areas, e.g. sensor databases, location based services or face recognition systems, distances between objects have to be computed based on vague and uncertain data. In this paper, we propose to express the similarity between two uncertain objects by probability density functions which assign a probability value to each possible distance value. By integrating these probabilistic distance functions directly into the join algorithms the full information provided by these functions is exploited. The resulting probabilistic similarity join assigns to each object pair a probability value indicating the likelihood that the object pair belongs to the result set. As the computation of these probability values is very expensive, we introduce an efficient join processing strategy exemplarily for the distance-range join. In a detailed experimental evaluation, we demonstrate the benefits of our probabilistic similarity join. The experiments show that we can achieve high quality join results with rather low computational cost.