The Gauss-Tree: Efficient Object Identification in Databases of Probabilistic Feature Vectors

Authors:
Christian Bohm;Alexey Pryakhin;Matthias Schubert
Affiliations:
University of Munich, Germany;University of Munich, Germany;University of Munich, Germany
Venue:
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Year:
2006

Citing 0
Cited 24

Probabilistic ranked queries in uncertain databases

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Monochromatic and bichromatic reverse skyline search over uncertain databases

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Top-k dominating queries in uncertain databases

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Evaluating probability threshold k-nearest-neighbor queries over uncertain data

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
PROUD: a probabilistic approach to processing similarity queries over uncertain data streams

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Hot Item Detection in Uncertain Data

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Efficient processing of probabilistic reverse nearest neighbor queries over uncertain data

The VLDB Journal — The International Journal on Very Large Data Bases
Scalable processing of snapshot and continuous nearest-neighbor queries over one-dimensional uncertain data

The VLDB Journal — The International Journal on Very Large Data Bases
Probabilistic skyline queries

Proceedings of the 18th ACM conference on Information and knowledge management
Efficient join processing on uncertain data streams

Proceedings of the 18th ACM conference on Information and knowledge management
Reverse skyline search in uncertain databases

ACM Transactions on Database Systems (TODS)
Querying objects modeled by arbitrary probability distributions

SSTD'07 Proceedings of the 10th international conference on Advances in spatial and temporal databases
Threshold query optimization for uncertain data

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Querying and cleaning uncertain data

QuaCon'09 Proceedings of the 1st international conference on Quality of context
Efficient fuzzy top-k query processing over uncertain objects

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part I
Ranking continuous probabilistic datasets

Proceedings of the VLDB Endowment
Similarity search and mining in uncertain databases

Proceedings of the VLDB Endowment
Probabilistic inverse ranking queries in uncertain databases

The VLDB Journal — The International Journal on Very Large Data Bases
A novel tumor grading technique using functional magnetic resonance imaging

Proceedings of the 2011 workshop on Data mining for medicine and healthcare
Shooting top-k stars in uncertain databases

The VLDB Journal — The International Journal on Very Large Data Bases
Top-$\boldsymbol{k}$ query processing over uncertain data in distributed environments

World Wide Web
Probabilistic top-k dominating queries in uncertain databases

Information Sciences: an International Journal
Processing probabilistic range queries over gaussian-based uncertain data

SSTD'13 Proceedings of the 13th international conference on Advances in Spatial and Temporal Databases
Extreme learning machine for classification over uncertain data

Neurocomputing

Quantified Score

Hi-index	0.01

Visualization

Abstract

In applications of biometric databases the typical task is to identify individuals according to features which are not exactly known. Reasons for this inexactness are varying measuring techniques or environmental circumstances. Since these circumstances are not necessarily the same when determining the features for different individuals, the exactness might strongly vary between the individuals as well as between the features. To identify individuals, similarity search on feature vectors is applicable, but even the use of adaptable distance measures is not capable to handle objects having an individual level of exactness. Therefore, we develop a comprehensive probabilistic theory in which uncertain observations are modeled by probabilistic feature vectors (pfv), i.e. feature vectors where the conventional feature values are replaced by Gaussian probability distribution functions. Each feature value of each object is complemented by a variance value indicating its uncertainty. We define two types of identification queries, k-mostlikely identification and threshold identification. For efficient query processing, we propose a novel index structure, the Gauss-tree. Our experimental evaluation demonstrates that pfv stored in a Gauss-tree significantly improve the result quality compared to traditional feature vectors. Additionally, we show that the Gauss-tree significantly speeds up query times compared to competitive methods.