Efficient histogram-based similarity search in ultra-high dimensional space

Authors:
Jiajun Liu;Zi Huang;Heng Tao Shen;Xiaofang Zhou
Affiliations:
School of ITEE, University of Queensland, Australia;School of ITEE, University of Queensland, Australia and Queensland Research Laboratory, National ICT Australia;School of ITEE, University of Queensland, Australia;School of ITEE, University of Queensland, Australia and Queensland Research Laboratory, National ICT Australia
Venue:
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications: Part II
Year:
2011

Citing 20
Cited 0

Color indexing

International Journal of Computer Vision
Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases

ACM Computing Surveys (CSUR)
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Locality-sensitive hashing scheme based on p-stable distributions

SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
CVA file: an index structure for high-dimensional datasets

Knowledge and Information Systems
Towards effective indexing for very large video sequence database

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
iDistance: An adaptive B+-tree based indexing method for nearest neighbor search

ACM Transactions on Database Systems (TODS)
Content-based multimedia information retrieval: State of the art and challenges

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Inverted files for text search engines

ACM Computing Surveys (CSUR)
Hierarchical Indexing Structure for Efficient Similarity Search in Video Retrieval

IEEE Transactions on Knowledge and Data Engineering
Face Description with Local Binary Patterns: Application to Face Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
An adaptive and dynamic dimensionality reduction method for high-dimensional indexing

The VLDB Journal — The International Journal on Very Large Data Bases
Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Image retrieval: Ideas, influences, and trends of the new age

ACM Computing Surveys (CSUR)
Quality and efficiency in high dimensional nearest neighbor search

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Local derivative pattern versus local binary pattern: face recognition with high-order local pattern descriptor

IEEE Transactions on Image Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent development in image content analysis has shown that the dimensionality of an image feature can reach thousands or more for satisfactory results in some applications such as face recognition. Although high-dimensional indexing has been extensively studied in database literature, most existing methods are tested for feature spaces with less than hundreds of dimensions and their performance degrades quickly as dimensionality increases. Given the huge popularity of histogram features in representing image content, in this papers we propose a novel indexing structure for efficient histogram based similarity search in ultra-high dimensional space which is also sparse. Observing that all possible histogram values in a domain form a finite set of discrete states, we leverage the time and space efficiency of inverted file. Our new structure, named two-tier inverted file, indexes the data space in two levels, where the first level represents the list of occurring states for each individual dimension, and the second level represents the list of occurring images for each state. In the query process, candidates can be quickly identified with a simple weighted state-voting scheme before their actual distances to the query are computed. To further enrich the discriminative power of inverted file, an effective state expansion method is also introduced by taking neighbor dimensions' information into consideration. Our extensive experimental results on real-life face datasets with 15,488 dimensional histogram features demonstrate the high accuracy and the great performance improvement of our proposal over existing methods.