GMM-ClusterForest: a novel indexing approach for multi-features based similarity search in high-dimensional spaces

Authors:
Yuchai Wan;Xiabi Liu;Kunqi Tong;Xue Wei;Yi Wu;Fei Guan;Kunpeng Pang
Affiliations:
Beijing Lab of Intelligent Information Technology, School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China;Beijing Lab of Intelligent Information Technology, School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China;Beijing Lab of Intelligent Information Technology, School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China;Beijing Lab of Intelligent Information Technology, School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China;Beijing Lab of Intelligent Information Technology, School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China;Beijing Lab of Intelligent Information Technology, School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China;Beijing Lab of Intelligent Information Technology, School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
Venue:
ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part II
Year:
2012

Citing 8
Cited 0

Density-based indexing for approximate nearest-neighbor queries

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture LIbraries

IEEE Transactions on Pattern Analysis and Machine Intelligence
Clustering for Approximate Similarity Search in High-Dimensional Spaces

IEEE Transactions on Knowledge and Data Engineering
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Contorting high dimensional data for efficient main memory KNN processing

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
ClusterTree: Integration of Cluster Representation and Nearest-Neighbor Search for Large Data Sets with High Dimensions

IEEE Transactions on Knowledge and Data Engineering
SS-ClusterTree: a subspace clustering based indexing algorithm over high-dimensional image features

CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
A kurtosis-based dynamic approach to Gaussian mixture modeling

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a novel clustering based indexing approach called GMM-ClusterForest for supporting multi-features based similarity search in high-dimensional spaces. We fit a Gaussian Mixture Model (GMM) to data through the Expectation-Maximization (EM) algorithm for estimating GMM parameters and the Minimum Description Length (MDL) criterion for selecting GMM structure. Each Gaussian component in the GMM is taken as a cluster center and each data point is assigned to the cluster according to the Bayesian decision rule. By performing this clustering method hierarchically, an index tree is constructed and the corresponding similarity search method is developed for a type of features. Then multi-features based similarity search is fulfilled by fusing the index trees for all the types of features considered. We evaluated the proposed indexing approach through applying it to example-based image retrieval and conducting the experiments on Corel 1000 dataset and self-collected large dataset. The experimental results show that our approach is effective and promising.