On High Dimensional Indexing of Uncertain Data

Authors:
Charu C. Aggarwal;Philip S. Yu
Affiliations:
IBM T. J. Watson Research Center, 19 Skyline Drive, Hawthorne, NY 10532, USA. charu@us.ibm.com;IBM T. J. Watson Research Center, 19 Skyline Drive, Hawthorne, NY 10532, USA. psyu@us.ibm.com
Venue:
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Year:
2008

Citing 0
Cited 10

Monochromatic and bichromatic reverse skyline search over uncertain databases

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
PROUD: a probabilistic approach to processing similarity queries over uncertain data streams

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Efficient processing of probabilistic reverse nearest neighbor queries over uncertain data

The VLDB Journal — The International Journal on Very Large Data Bases
Multi-agent system for customer relationship management with SVMs tool

International Journal of Intelligent Information and Database Systems
Can shared-neighbor distances defeat the curse of dimensionality?

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Subspace similarity search: efficient k-NN queries in arbitrary subspaces

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
MUD: Mapping-based query processing for high-dimensional uncertain data

Information Sciences: an International Journal
DuoWave: Mitigating the curse of dimensionality for uncertain data

Data & Knowledge Engineering
Effectively indexing the multi-dimensional uncertain objects for range searching

Proceedings of the 15th International Conference on Extending Database Technology
A survey on unsupervised outlier detection in high-dimensional numerical data

Statistical Analysis and Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we will examine the problem of distance function computation and indexing uncertain data in high dimensionality for nearest neighbor and range queries. Because of the inherent noise in uncertain data, traditional distance function measures such as the Lq-metric and their probabilistic variants are not qualitatively effective. This problem is further magnified by the sparsity issue in high dimensionality. In this paper, we examine methods of computing distance functions for high dimensional data which are qualitatively effective and friendly to the use of indexes. In this paper, we show how to construct an effective index structure in order to handle uncertain similarity and range queries in high dimensionality. Typical range queries in high dimensional space use only a subset of the ranges in order to resolve the queries. Furthermore, it is often desirable to run similarity queries with only a subset of the large number of dimensions. Such queries are difficult to resolve with traditional index structures which use the entire set of dimensions. We propose query-processing techniques which use effective search methods on the index in order to compute the final results. We discuss the experimental results on a number of real and synthetic data sets in terms of effectiveness and efficiency. We show that the proposed distance measures are not only more effective than traditional Lq-norms, but can also be computed more efficiently over our proposed index structure.