Clustering for Approximate Similarity Search in High-Dimensional Spaces

Authors:
Chen Li;Edward Chang;Hector Garcia-Molina;Gio Wiederhold
Affiliations:
-;-;-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2002

Citing 41
Cited 49

The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Vector quantization and signal compression

Vector quantization and signal compression
An algorithm for approximate closest-point queries

SCG '94 Proceedings of the tenth annual symposium on Computational geometry
Nearest neighbor queries

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Copy detection mechanisms for digital documents

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Neural networks for pattern recognition

Neural networks for pattern recognition
VisualSEEk: a fully automated content-based image query system

MULTIMEDIA '96 Proceedings of the fourth ACM international conference on Multimedia
The SR-tree: an index structure for high-dimensional nearest neighbor queries

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Visual information retrieval

Communications of the ACM
Wavelet-based image indexing techniques with partial sketch retrieval capability

IEEE ADL '97 Proceedings of the IEEE international forum on Research and technology advances in digital libraries
Two algorithms for nearest-neighbor search in high dimensions

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The pyramid-technique: towards breaking the curse of dimensionality

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Efficient search for approximate nearest neighbor in high dimensional spaces

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Density-based indexing for approximate nearest-neighbor queries

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
An optimal algorithm for approximate nearest neighbor searching

SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
PBIR - perception-based image retrieval

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
PBIR: perception-based image retrieval-a system that can quickly capture subjective image query concepts

MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
Machine Learning

Machine Learning
Database Design

Database Design
Database System Implementation

Database System Implementation
The K-D-B-tree: a search structure for large multidimensional dynamic indexes

SIGMOD '81 Proceedings of the 1981 ACM SIGMOD international conference on Management of data
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
The TV-tree: an index structure for high-dimensional data

The VLDB Journal — The International Journal on Very Large Data Bases - Spatial Database Systems
Query by Image and Video Content: The QBIC System

Computer
Safeguarding and Charging for Information on the Internet

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Similarity Indexing with the SS-tree

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Ranking in Spatial Databases

SSD '95 Proceedings of the 4th International Symposium on Advances in Spatial Databases
GBI: A Generalized R-Tree Bulk-Insertion Strategy

SSD '99 Proceedings of the 6th International Symposium on Advances in Spatial Databases
Adaptive Color-Image Embeddings for Database Navigation

ACCV '98 Proceedings of the Third Asian Conference on Computer Vision-Volume I - Volume I
Knowledge Discovery in Spatial Databases

KI '99 Proceedings of the 23rd Annual German Conference on Artificial Intelligence: Advances in Artificial Intelligence
Approximate similarity retrieval with M-trees

The VLDB Journal — The International Journal on Very Large Data Bases
PAC Nearest Neighbor Queries: Approximate and Controlled Search in High-Dimensional and Metric Spaces

ICDE '00 Proceedings of the 16th International Conference on Data Engineering

PBIR - perception-based image retrieval

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Support vector machine active learning for image retrieval

MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
PBIR: perception-based image retrieval-a system that can quickly capture subjective image query concepts

MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
DynDex: a dynamic and non-metric space indexer

Proceedings of the tenth ACM international conference on Multimedia
VQ-index: an index structure for similarity searching in multimedia databases

Proceedings of the tenth ACM international conference on Multimedia
MEGA---the maximizing expected generalization algorithm for learning complex query concepts

ACM Transactions on Information Systems (TOIS)
Approximate searches: k-neighbors + precision

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Robust Object Recognition in Images and the Related Database Problems

Multimedia Tools and Applications
Visually mining and monitoring massive time series

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Multimodal concept-dependent active learning for image retrieval

Proceedings of the 12th annual ACM international conference on Multimedia
ClusterMap: labeling clusters in large datasets via visualization

Proceedings of the thirteenth ACM international conference on Information and knowledge management
On scalability of active learning for formulating query concepts

Proceedings of the 1st international workshop on Computer vision meets databases
Antipole Tree Indexing to Support Range Search and K-Nearest Neighbor Search in Metric Spaces

IEEE Transactions on Knowledge and Data Engineering
Fast Approximate Similarity Search in Extremely High-Dimensional Data Sets

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Query-sensitive embeddings

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Visualizing and discovering non-trivial patterns in large time series databases

Information Visualization
iVIBRATE: Interactive visualization-based framework for clustering large datasets

ACM Transactions on Information Systems (TOIS)
Optimizing progressive query-by-example over pre-clustered large image databases

Proceedings of the 2nd international workshop on Computer vision meets databases
Persistent clustered main memory index for accelerating k-NN queries on high dimensional datasets

Proceedings of the 2nd international workshop on Computer vision meets databases
Query-sensitive embeddings

ACM Transactions on Database Systems (TODS)
AASA: a Method of Automatically Acquiring Semantic Annotations

Journal of Information Science
Unified framework for fast exact and approximate search in dissimilarity spaces

ACM Transactions on Database Systems (TODS)
Approximate NN queries on streams with guaranteed error/performance bounds

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
BoostMap: An Embedding Method for Efficient Nearest Neighbor Retrieval

IEEE Transactions on Pattern Analysis and Machine Intelligence
Mining user hidden semantics from image content for image retrieval

Journal of Visual Communication and Image Representation
Persistent clustered main memory index for accelerating k-NN queries on high dimensional datasets

Multimedia Tools and Applications
Approximate embedding-based subsequence matching of time series

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Nearest neighbor search methods for handshape recognition

Proceedings of the 1st international conference on PErvasive Technologies Related to Assistive Environments
Efficient Processing of Nearest Neighbor Queries in Parallel Multimedia Databases

DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
A posteriori multi-probe locality sensitive hashing

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Embedded Map Projection for Dimensionality Reduction-Based Similarity Search

SSPR & SPR '08 Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Approximate similarity search: A multi-faceted problem

Journal of Discrete Algorithms
Quality and efficiency in high dimensional nearest neighbor search

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Nonlinear Embedded Map Projection for Dimensionality Reduction

ICIAP '09 Proceedings of the 15th International Conference on Image Analysis and Processing
Videntifier™ forensic: a new law enforcement service for automatic identification of illegal video material

MiFor '09 Proceedings of the First ACM workshop on Multimedia in forensics
Quantization techniques for similarity search in high-dimensional data spaces

BNCOD'03 Proceedings of the 20th British national conference on Databases
Efficient and accurate nearest neighbor and closest pair search in high-dimensional space

ACM Transactions on Database Systems (TODS)
A database-based framework for gesture recognition

Personal and Ubiquitous Computing
A hierarchical Naïve Bayes model for approximate identity matching

Decision Support Systems
Embedding-based subsequence matching in time-series databases

ACM Transactions on Database Systems (TODS)
An incremental updating method for clustering-based high-dimensional data indexing

CIS'05 Proceedings of the 2005 international conference on Computational Intelligence and Security - Volume Part I
ISIS: a new approach for efficient similarity search in sparse databases

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part II
Hypersphere indexer

DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications
A conversation with Dr. Edward Y. Chang

ACM SIGKDD Explorations Newsletter
On the usage of clustering for content based image retrieval

CSR'07 Proceedings of the Second international conference on Computer Science: theory and applications
A data allocation method for efficient content-based retrieval in parallel multimedia databases

ISPA'07 Proceedings of the 2007 international conference on Frontiers of High Performance Computing and Networking
Automatic detection and visualization of distinctive structures in 3D unsteady multi-fields

EuroVis'08 Proceedings of the 10th Joint Eurographics / IEEE - VGTC conference on Visualization
GMM-ClusterForest: a novel indexing approach for multi-features based similarity search in high-dimensional spaces

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part II
A method for the acquisition of ontology-based user profiles

Advances in Engineering Software

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a clustering and indexing paradigm (called Clindex) for high-dimensional search spaces. The scheme is designed for approximate similarity searches, where one would like to find many of the data points near a target point, but where one can tolerate missing a few near points. For such searches, our scheme can find near points with high recall in very few IOs and perform significantly better than other approaches. Our scheme is based on finding clusters and, then, building a simple but efficient index for them. We analyze the trade-offs involved in clustering and building such an index structure, and present extensive experimental results.