High dimensional nearest neighbor searching

Authors:
Hakan Ferhatosmanoglu;Ertem Tuncel;Divyakant Agrawal;Amr El Abbadi
Affiliations:
Computer Science and Engineering, Ohio State University, Columbus, OH;Electrical Engineering, University of California, Riverside;Computer Science, University of California, Santa Barbara;Computer Science, University of California, Santa Barbara
Venue:
Information Systems
Year:
2006

Citing 46
Cited 8

Discrete-time signal processing

Discrete-time signal processing
The design and analysis of spatial data structures

The design and analysis of spatial data structures
The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
The hB-tree: a multiattribute indexing method with good guaranteed performance

ACM Transactions on Database Systems (TODS)
Vector quantization and signal compression

Vector quantization and signal compression
Improving text retrieval for the routing problem using latent semantic indexing

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient and effective querying by image content

Journal of Intelligent Information Systems - Special issue: advances in visual information management systems
Fast subsequence matching in time-series databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Approximate range searching

Proceedings of the eleventh annual symposium on Computational geometry
Nearest neighbor queries

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Digital image processing

Digital image processing
Texture Features for Browsing and Retrieval of Image Data

IEEE Transactions on Pattern Analysis and Machine Intelligence
S3: similarity search in CAD database systems

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Scalable access within the context of digital libraries

IEEE ADL '97 Proceedings of the IEEE international forum on Research and technology advances in digital libraries
A cost model for nearest neighbor search in high-dimensional data space

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Principles of multimedia database systems

Principles of multimedia database systems
The pyramid-technique: towards breaking the curse of dimensionality

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Dimensionality reduction for similarity searching in dynamic databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Multidimensional access methods

ACM Computing Surveys (CSUR)
An optimal algorithm for approximate nearest neighbor searching

SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
The Grid File: An Adaptable, Symmetric Multikey File Structure

ACM Transactions on Database Systems (TODS)
Vector approximation based indexing for non-uniform high dimensional data sets

Proceedings of the ninth international conference on Information and knowledge management
Dimensionality reduction and similarity computation by inner product approximations

Proceedings of the ninth international conference on Information and knowledge management
Locally adaptive dimensionality reduction for indexing large time series databases

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Efficient k-NN search on vertically decomposed data

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
The K-D-B-tree: a search structure for large multidimensional dynamic indexes

SIGMOD '81 Proceedings of the 1981 ACM SIGMOD international conference on Management of data
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
The TV-tree: an index structure for high-dimensional data

The VLDB Journal — The International Journal on Very Large Data Bases - Spatial Database Systems
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Similarity Indexing with the SS-tree

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Approximate Nearest Neighbor Searching in Multimedia Databases

Proceedings of the 17th International Conference on Data Engineering
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Contrast Plots and P-Sphere Trees: Space vs. Time in Nearest Neighbour Searches

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Indexing the Distance: An Efficient Method to KNN Processing

Proceedings of the 27th International Conference on Very Large Data Bases
Fast Nearest Neighbor Search in Medical Image Databases

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Efficient User-Adaptable Similarity Search in Large Multimedia Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Approximate similarity retrieval with M-trees

The VLDB Journal — The International Journal on Very Large Data Bases
Non-linear dimensionality reduction techniques for classification and visualization

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
The Hybrid Tree: An Index Structure for High Dimensional Feature Spaces

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
PAC Nearest Neighbor Queries: Approximate and Controlled Search in High-Dimensional and Metric Spaces

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Independent Quantization: An Index Compression Technique for High-Dimensional Data Spaces

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Dimensionality reduction using magnitude and shape approximations

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
LDC: Enabling Search By Partial Distance In A Hyper-Dimensional Space

ICDE '04 Proceedings of the 20th International Conference on Data Engineering

A practical approach to the 2D incremental nearest-point problem suitable for different point distributions

Pattern Recognition
Dynamic skyline queries in metric spaces

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
An index and retrieval framework integrating perceptive features and semantics for multimedia databases

Multimedia Tools and Applications
A flexible framework to ease nearest neighbor search in multidimensional data spaces

Data & Knowledge Engineering
Optimizing similarity-based image joins in a multimedia database

Proceedings of the international workshop on Very-large-scale multimedia corpus, mining and retrieval
Fast k-NN classifier for documents based on a graph structure

CIARP'10 Proceedings of the 15th Iberoamerican congress conference on Progress in pattern recognition, image analysis, computer vision, and applications
Pivot selection: Dimension reduction for distance-based indexing

Journal of Discrete Algorithms
Efficient processing of probabilistic group subspace skyline queries in uncertain databases

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

As databases increasingly integrate different types ot Information SUCh as time-series, multimedia and scientific data, it becomes necessary to support efficient retrieval of multi-dimensional data. Both the dimensionality and the amount of data that needs to be processed are increasing rapidly. As a result of the scale and high dimensional nature, the traditional techniques have proven inadequate. In this paper, we propose search techniques that are effective especially for large high dimensional data sets. We first propose VA+-file technique which is based on scalar quantization of the data. VA+-file is especially useful for searching exact nearest neighbors (NN) in non-uniform high dimensional data sets. We then discuss how to improve the search and make it progressive by allowing some approximations in the query result. We develop a general framework for approximate NN queries, discuss various approaches for progressive processing of similarity queries, and develop a metric for evaluation of such techniques. Finally, a new technique based on clustering is proposed, which merges the benefits of various approaches for progressive similarity searching. Extensive experimental evaluation is performed on several real-life data sets. The evaluation establishes the superiority of the proposed techniques over the existing techniques for high dimensional similarity searching. The techniques proposed in this paper are effective for real-life data sets, which are typically non-uniform, and they are scalable with respect to both dimensionality and size of the data set.