Dimensionality reduction for similarity search with the Euclidean distance in high-dimensional applications

Authors:
Seungdo Jeong;Sang-Wook Kim;Byung-Uk Choi
Affiliations:
Department of Electronics and Computer Engineering, Hanyang University, Seoul, South Korea;Department of Electronics and Computer Engineering, Hanyang University, Seoul, South Korea;Department of Electronics and Computer Engineering, Hanyang University, Seoul, South Korea
Venue:
Multimedia Tools and Applications
Year:
2009

Citing 22
Cited 0

The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Efficient and effective querying by image content

Journal of Intelligent Information Systems - Special issue: advances in visual information management systems
Window query-optimal clustering of spatial objects

PODS '95 Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Fast parallel similarity search in multimedia databases

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The SR-tree: an index structure for high-dimensional nearest neighbor queries

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Optimal multi-step k-nearest neighbor search

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Dimensionality reduction for similarity searching in dynamic databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
On the effects of dimensionality reduction on high dimensional similarity search

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases

ACM Computing Surveys (CSUR)
The TV-tree: an index structure for high-dimensional data

The VLDB Journal — The International Journal on Very Large Data Bases - Spatial Database Systems
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Similarity Indexing with the SS-tree

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Efficient User-Adaptable Similarity Search in Large Multimedia Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Dimensionality reduction using magnitude and shape approximations

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
A duality view of spectral methods for dimensionality reduction

ICML '06 Proceedings of the 23rd international conference on Machine learning
Persistent clustered main memory index for accelerating k-NN queries on high dimensional datasets

Multimedia Tools and Applications
Optimal subspace dimensionality for k-nearest-neighbor queries on clustered and dimensionality reduced datasets with SVD

Multimedia Tools and Applications
Dimensionality Reduction and Similarity Computation by Inner-Product Approximations

IEEE Transactions on Knowledge and Data Engineering
Riemannian manifold learning for nonlinear dimensionality reduction

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I
An effective method for approximating the euclidean distance in high-dimensional space

DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

In multimedia information retrieval, multimedia data are represented as vectors in high-dimensional space. To search these vectors efficiently, a variety of indexing methods have been proposed. However, the performance of these indexing methods degrades dramatically with increasing dimensionality, which is known as the dimensionality curse. To resolve the dimensionality curse, dimensionality reduction methods have been proposed. They map feature vectors in high-dimensional space into vectors in low-dimensional space before the data are indexed. This paper proposes a novel method for dimensionality reduction based on a function that approximates the Euclidean distance based on the norm and angle components of a vector. First, we identify the causes of, and discuss basic solutions to, errors in angle approximation during the approximation of the Euclidean distance. Then, this paper propose a new method for dimensionality reduction that extracts a set of subvectors from a feature vector and maintains only the norm and the approximated angle for every subvector. The selection of a good reference vector is crucial for accurate approximation of the angle component. We present criteria for being a good reference vector, and propose a method that chooses a good reference vector. Also, we define a novel distance function using the norm and angle components, and formally prove that the distance function consistently lower-bounds the Euclidean distance. This implies information retrieval with this function does not incur any false dismissals. Finally, the superiority of the proposed approach is verified via extensive experiments with synthetic and real-life data sets.