Independent Quantization: An Index Compression Technique for High-Dimensional Data Spaces

Authors:
Affiliations:
Venue:
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Year:
2000

Citing 0
Cited 69

Multidimensional Index Structures in Relational Databases

Journal of Intelligent Information Systems - Data warehousing and knowledge discovery
A cost model for query processing in high dimensional data spaces

ACM Transactions on Database Systems (TODS)
Adaptive nearest neighbor search for relevance feedback in large image databases

MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
Effective nearest neighbor indexing with the euclidean metric

Proceedings of the tenth international conference on Information and knowledge management
Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases

ACM Computing Surveys (CSUR)
Efficient k-NN search on vertically decomposed data

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Combining Approximation Techniques and Vector Quantization for Adaptable Similarity Search

Journal of Intelligent Information Systems - Special issue on data warehousing and knowledge discovery
A retrieval technique for high-dimensional data and partially specified queries

Data & Knowledge Engineering
On the 'Dimensionality Curse' and the 'Self-Similarity Blessing'

IEEE Transactions on Knowledge and Data Engineering
VQ-index: an index structure for similarity searching in multimedia databases

Proceedings of the tenth ACM international conference on Multimedia
Dynamically Optimizing High-Dimensional Index Structures

EDBT '00 Proceedings of the 7th International Conference on Extending Database Technology: Advances in Database Technology
The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Indexing the Distance: An Efficient Method to KNN Processing

Proceedings of the 27th International Conference on Very Large Data Bases
Fast Evaluation Techniques for Complex Similarity Queries

Proceedings of the 27th International Conference on Very Large Data Bases
Efficient Similarity Search in Feature Spaces with the Q-Tree

ADBIS '02 Proceedings of the 6th East European Conference on Advances in Databases and Information Systems
The Pruning Power: Theory and Heuristics for Mining Databases with Multiple k-Nearest-Neighbor Queries

DaWaK 2000 Proceedings of the Second International Conference on Data Warehousing and Knowledge Discovery
Adaptable Similarity Search Using Vector Quantization

DaWaK '01 Proceedings of the Third International Conference on Data Warehousing and Knowledge Discovery
Optimal Dimension Order: A Generic Technique for the Similarity Join

DaWaK 2000 Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery
A General Approach to Compression of Hierarchical Indexes

DEXA '01 Proceedings of the 12th International Conference on Database and Expert Systems Applications
Spatial indexing of high-dimensional data based on relative approximation

The VLDB Journal — The International Journal on Very Large Data Bases
Using sets of feature vectors for similarity search on voxelized CAD objects

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Video query processing in the VDBMS testbed for video database research

MMDB '03 Proceedings of the 1st ACM international workshop on Multimedia databases
LDC: Enabling Search By Partial Distance In A Hyper-Dimensional Space

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Computing Clusters of Correlation Connected objects

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Diagonal Ordering: a new approach to high-dimensional KNN processing

ADC '04 Proceedings of the 15th Australasian database conference - Volume 27
The Active Vertice method: a performant filtering approach to high-dimensional indexing

Data & Knowledge Engineering
Decoupling partitioning and grouping: Overcoming shortcomings of spatial indexing with bucketing

ACM Transactions on Database Systems (TODS)
Array-index: a plug&search K nearest neighbors method for high-dimensional data

Data & Knowledge Engineering
iDistance: An adaptive B+-tree based indexing method for nearest neighbor search

ACM Transactions on Database Systems (TODS)
Database-inspired search

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Toward Efficient Multifeature Query Processing

IEEE Transactions on Knowledge and Data Engineering
Filter ranking in high-dimensional space

Data & Knowledge Engineering
High dimensional nearest neighbor searching

Information Systems
Access Structures for Angular Similarity Queries

IEEE Transactions on Knowledge and Data Engineering
Hierarchical Indexing Structure for Efficient Similarity Search in Video Retrieval

IEEE Transactions on Knowledge and Data Engineering
Exploring composite acoustic features for efficient music similarity query

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
The Concentration of Fractional Distances

IEEE Transactions on Knowledge and Data Engineering
Interactive high-dimensional index for large Chinese calligraphic character databases

ACM Transactions on Asian Language Information Processing (TALIP)
Efficient high-dimensional indexing by sorting principal component

Pattern Recognition Letters
Composite distance transformation for indexing and k-nearest-neighbor searching in high-dimensional spaces

Journal of Computer Science and Technology
Indexing high-dimensional data in dual distance spaces: a symmetrical encoding approach

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
The TS-tree: efficient time series search and retrieval

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering

ACM Transactions on Knowledge Discovery from Data (TKDD)
Quality and efficiency in high dimensional nearest neighbor search

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
A Web-Based Search Engine for Chinese Calligraphic Manuscript Images

ICWL '009 Proceedings of the 8th International Conference on Advances in Web Based Learning
Efficient Similarity Search by Reducing I/O with Compressed Sketches

SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
QUC-tree: integrating query context information for efficient music retrieval

IEEE Transactions on Multimedia - Special issue on integration of context and content
A Speed-Up Hierarchical Compact Clustering Algorithm for Dynamic Document Collections

CIARP '09 Proceedings of the 14th Iberoamerican Conference on Pattern Recognition: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
Quantization techniques for similarity search in high-dimensional data spaces

BNCOD'03 Proceedings of the 20th British national conference on Databases
An efficient compression technique for a multi-dimensional index in main memory

VISUAL'07 Proceedings of the 9th international conference on Advances in visual information systems
High-dimensional indexing: transformational approaches to high-dimensional range and similarity searches

High-dimensional indexing: transformational approaches to high-dimensional range and similarity searches
Efficient and accurate nearest neighbor and closest pair search in high-dimensional space

ACM Transactions on Database Systems (TODS)
Indexing high-dimensional data for main-memory similarity search

Information Systems
TF-Tree: an interactive and efficient retrieval of Chinese calligraphic manuscript images based on triple features

Proceedings of the ACM International Conference on Image and Video Retrieval
Can shared-neighbor distances defeat the curse of dimensionality?

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Subspace similarity search: efficient k-NN queries in arbitrary subspaces

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Efficient incremental near duplicate detection based on locality sensitive hashing

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part I
Fast k-NN classifier for documents based on a graph structure

CIARP'10 Proceedings of the 15th Iberoamerican congress conference on Progress in pattern recognition, image analysis, computer vision, and applications
Effective data co-reduction for multimedia similarity search

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Probabilistic and interactive retrieval of chinese calligraphic character images based on multiple features

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Fast answering k-nearest-neighbor queries over large image databases using dual distance transformation

MMM'07 Proceedings of the 13th international conference on Multimedia Modeling - Volume Part I
Indexing structures for content-based retrieval of large image databases: a review

AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
FIS-by-Step: visualization of the fast index scan for nearest neighbor queries

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Approximated clustering of distributed high-dimensional data

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Multi-represented kNN-classification for large class sets

DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
VA-files vs. r*-trees in distance join queries

ADBIS'05 Proceedings of the 9th East European conference on Advances in Databases and Information Systems
Efficient probabilistic image retrieval based on a mixed feature model

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Nearest group queries

Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Hypergraph Spectral Hashing for image retrieval with heterogeneous social contexts

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Two major approaches have been proposed to efficiently process queries in databases: Speeding up the search by using index structures, and speeding up the search by operating on a compressed database, such as a signature file. Both approaches have their limitations: Indexing techniques are inefficient in extreme configurations, such as high-dimensional spaces, where even a simple scan may be cheaper than an index-based search. Compression techniques are not very efficient in all other situations. We propose to combine both techniques to search for nearest neighbors in a high-dimensional space.For this purpose, we develop a compressed index, called the IQ-tree, with a three-level structure: The first level is a regular (flat) directory consisting of minimum bounding boxes, the second level contains data points in a compressed representation, and the third level contains the actual data.We overcome several engineering challenges in constructing an effective index structure of this type. The most significant of these is to decide how much to compress at the second level. Too much compression will lead to many needless expensive accesses to the third level. Too little compression will increase both the storage and the access cost for the first two levels.We develop a cost model and an optimization algorithm based on this cost model that permits an independent determination of the degree of compression for each second level page to minimize expected query cost. In an experimental evaluation, we demonstrate that the IQ-tree shows a performance that is the "best of both worlds" for a wide range of data distributions and dimensionalities.