SS-ClusterTree: a subspace clustering based indexing algorithm over high-dimensional image features

Authors:
Hongli Xu;Dantong Yu;De Xu;Aidong Zhang
Affiliations:
Beijing Jiaotong University, Beijing, China;Brookhaven National Lab, Upton, NY, USA;Beijing Jiaotong University, Beijing, China;The State University of New York at Buffalo, Buffalo, NY, USA
Venue:
CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
Year:
2008

Citing 11
Cited 2

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Entropy-based subspace clustering for mining numerical data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
A new cell-based clustering method for large, high-dimensional data in data mining applications

Proceedings of the 2002 ACM symposium on Applied computing
WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
STING: A Statistical Information Grid Approach to Spatial Data Mining

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
ClusterTree: Integration of Cluster Representation and Nearest-Neighbor Search for Large Data Sets with High Dimensions

IEEE Transactions on Knowledge and Data Engineering
Subspace clustering for high dimensional data: a review

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
An applicable hierarchical clustering algorithm for content-based image retrieval

MIRAGE'07 Proceedings of the 3rd international conference on Computer vision/computer graphics collaboration techniques

Image clustering using local discriminant models and global integration

IEEE Transactions on Image Processing - Special section on distributed camera networks: sensing, processing, communication, and implementation
GMM-ClusterForest: a novel indexing approach for multi-features based similarity search in high-dimensional spaces

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

The rapid growth in the volume of image and video data collections motivates the research of building an index structure in image information retrieval. Constructing an index in the image database poses a very challenging problem due to the facts of image databases containing data with high dimensions, and lack of domain knowledge. ClusterTree is an indexing approach representing clusters generated by any existing clustering approach and do not need any prior knowledge. It is a hierarchy of clusters and subcluster which incorporates the cluster representation into the index structure to achieve effective and efficient retrieval. However, one disadvantage of ClusterTree is that non-clustering data points are often ignored. These non-clustering data points might represent interesting targets in an image database. In this paper, we propose a modified ClusterTree structure(called SS-ClusterTree), which is based on subspace clustering. The SS-ClusterTree includes two kinds of leaf nodes, a cluster leaf node and a noise leaf node. When a new data item is added to the SS-ClusterTree, if it belongs to a cluster, it is inserted into the corresponding the cluster leaf node, otherwise into the noise leaf node. The noise leaf node will be split while its volume is more than a certain threshold. We present a novel updating technique which optimizes the internal structure of the SS-ClusterTree by utilizing the Newton's Universal Law of Gravitation. When a noise node is split, the attraction forces are calculated between every new node and its sibling nodes. These new nodes may be merged by their sibling nodes, if the attraction force between them is the most significant. Meanwhile the nodes intersecting boundaries are updated. This approach guarantees that the SS-ClusterTree always represents the current dataset structure, and helps to identify the pattern hiding in the newly added data. SS-ClusterTree can efficiently support the dynamic insertion and manage the dataset with non-clustering data, and is highly adaptive to any kind of cluster structure. Our experiment results also show that this index structure is effective and efficient.