High-dimensional nearest neighbor search with remote data centers

Authors:
Changzhou Wang;Xiaoyang Sean Wang
Affiliations:
Mathematics and Computing Technology, The Boeing Company, Bellevue, WA;Department of Information and Software Engineering, George Mason University, Fairfax, VA
Venue:
Knowledge and Information Systems
Year:
2002

Citing 28
Cited 1

Nearest neighbor queries

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The SR-tree: an index structure for high-dimensional nearest neighbor queries

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Two algorithms for nearest-neighbor search in high dimensions

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
A cost model for nearest neighbor search in high-dimensional data space

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
The pyramid-technique: towards breaking the curse of dimensionality

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Optimal multi-step k-nearest neighbor search

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
High-dimensional index structures database support for next decade's applications (tutorial)

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
CONTROL: continuous output and navigation technology with refinement on-line

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Efficient search for approximate nearest neighbor in high dimensional spaces

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Multidimensional access methods

ACM Computing Surveys (CSUR)
Lower bounds for high dimensional nearest neighbor search and related problems

STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Simulation and the Monte Carlo Method

Simulation and the Monte Carlo Method
Interactive Data Analysis: The Control Project

Computer
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Fast Nearest Neighbor Search in High-Dimensional Space

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
The Haar Wavelet Transform in the Time Series Similarity Paradigm

PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Online Dynamic Reordering for Interactive Data Processing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Fast Nearest Neighbor Search in Medical Image Databases

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Large-Sample and Deterministic Confidence Intervals for Online Aggregation

SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management
On Similarity Queries for Time-Series Data: Constraint Specification and Implementation

CP '95 Proceedings of the First International Conference on Principles and Practice of Constraint Programming
Indexing very high-dimensional sparse and quasi-sparse vectors for similarity searches

The VLDB Journal — The International Journal on Very Large Data Bases
Remote Data Access via the SIESIP Distributed Information System

SSDBM '99 Proceedings of the 11th International Conference on Scientific and Statistical Database Management
On Similarity-Based Queries for Time Series Data

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Efficient Time Series Matching by Wavelets

ICDE '99 Proceedings of the 15th International Conference on Data Engineering

Data mining on the cell broadband engine

Proceedings of the 22nd annual international conference on Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many data centers have archived a tremendous amount of data and begun to publish them on the Web. Due to limited resources and large amount of service requests, data centers usually do not directly support high-cost queries. On the other hand, users are often overwhelmed by the huge data volume and cannot afford to download the whole data sets and search them locally. To support high-dimensional nearest neighbor searches in this environment, the paper develops a multi-level approximation scheme. The coarsest-level approximations are stored locally and searched first. The result is then refined gradually via accesses to remote data centers. Data centers need only to deliver data items or their precomputed finer level approximations by their identifiers.The searching process is usually long in this environment, since it involves remote sites. This paper describes an online search process: the system periodically reports a data item and a positive integer M. The reported item is guaranteed to be one of the M nearest neighbors of the query one. The paper proposes two algorithms to minimize M in each period. Experiments show that one of them performs similarly as a theoretical a posteriori algorithm and significantly outperforms the online extensions of two state-of-the-art nearest neighbor search methods.