LDC: Enabling Search By Partial Distance In A Hyper-Dimensional Space

Authors:
Nick Koudas;Beng Chin Ooi;Heng Tao Shen;Anthony K. H. Tung
Affiliations:
-;-;-;-
Venue:
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Year:
2004

Citing 12
Cited 18

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The pyramid-technique: towards breaking the curse of dimensionality

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Subquadratic approximation algorithms for clustering problems in high dimensional spaces

STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases

ACM Computing Surveys (CSUR)
Efficient k-NN search on vertically decomposed data

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
When Is ''Nearest Neighbor'' Meaningful?

ICDT '99 Proceedings of the 7th International Conference on Database Theory
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Indexing the Distance: An Efficient Method to KNN Processing

Proceedings of the 27th International Conference on Very Large Data Bases
Independent Quantization: An Index Compression Technique for High-Dimensional Data Spaces

ICDE '00 Proceedings of the 16th International Conference on Data Engineering

Query-sensitive embeddings

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Exploring bit-difference for approximate KNN search in high-dimensional databases

ADC '05 Proceedings of the 16th Australasian database conference - Volume 39
Toward Efficient Multifeature Query Processing

IEEE Transactions on Knowledge and Data Engineering
High dimensional nearest neighbor searching

Information Systems
Hierarchical Indexing Structure for Efficient Similarity Search in Video Retrieval

IEEE Transactions on Knowledge and Data Engineering
Query-sensitive embeddings

ACM Transactions on Database Systems (TODS)
The Omni-family of all-purpose access methods: a simple and effective way to make similarity search more efficient

The VLDB Journal — The International Journal on Very Large Data Bases
BoostMap: An Embedding Method for Efficient Nearest Neighbor Retrieval

IEEE Transactions on Pattern Analysis and Machine Intelligence
Approximate embedding-based subsequence matching of time series

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Quality and efficiency in high dimensional nearest neighbor search

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Towards faster activity search using embedding-based subsequence matching

Proceedings of the 2nd International Conference on PErvasive Technologies Related to Assistive Environments
Towards optimal indexing for relevance feedback in large image databases

IEEE Transactions on Image Processing
Efficient and accurate nearest neighbor and closest pair search in high-dimensional space

ACM Transactions on Database Systems (TODS)
Efficient incremental near duplicate detection based on locality sensitive hashing

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part I
Effective data co-reduction for multimedia similarity search

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Embedding-based subsequence matching in time-series databases

ACM Transactions on Database Systems (TODS)
ISIS: a new approach for efficient similarity search in sparse databases

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part II
SIMP: accurate and efficient near neighbor search in high dimensional spaces

Proceedings of the 15th International Conference on Extending Database Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent advances in research fields like multimediaand bioinformatics have brought about a new generation of hyper-dimensional databases which can contain hundreds or even thousands of dimensions. Such hyper-dimensional databases pose significant problems to existinghigh-dimensional indexing techniques which have been developed for indexing databases with (commonly) lessthan a hundred dimensions. To support efficient querying and retrieval on hyper-dimensional databases, we propose a methodology called Local Digital Coding (LDC)which can support k-nearest neighbors (KNN) queries onhyper-dimensional databases and yet co-exist with ubiquitous indices, such as B+-trees. LDC extracts a simple bitmap representation called Digital Code(DC) for each point in the database.Pruning during KNN search is performed by dynamically selecting only a subset of the bits from the DC based on which subsequent comparisons are performed. In doing so, expensive operations involved in computing L-norm distance functions between hyper-dimensional data can be avoided. Extensive experiments are conducted to show that our methodology offers significant performance advantages over other existing indexing methods on both real life and synthetic hyper-dimensional datasets.