BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Entropy-based subspace clustering for mining numerical data
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
A new cell-based clustering method for large, high-dimensional data in data mining applications
Proceedings of the 2002 ACM symposium on Applied computing
WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
STING: A Statistical Information Grid Approach to Spatial Data Mining
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
IEEE Transactions on Knowledge and Data Engineering
Subspace clustering for high dimensional data: a review
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
An applicable hierarchical clustering algorithm for content-based image retrieval
MIRAGE'07 Proceedings of the 3rd international conference on Computer vision/computer graphics collaboration techniques
Image clustering using local discriminant models and global integration
IEEE Transactions on Image Processing - Special section on distributed camera networks: sensing, processing, communication, and implementation
ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part II
Hi-index | 0.00 |
The rapid growth in the volume of image and video data collections motivates the research of building an index structure in image information retrieval. Constructing an index in the image database poses a very challenging problem due to the facts of image databases containing data with high dimensions, and lack of domain knowledge. ClusterTree is an indexing approach representing clusters generated by any existing clustering approach and do not need any prior knowledge. It is a hierarchy of clusters and subcluster which incorporates the cluster representation into the index structure to achieve effective and efficient retrieval. However, one disadvantage of ClusterTree is that non-clustering data points are often ignored. These non-clustering data points might represent interesting targets in an image database. In this paper, we propose a modified ClusterTree structure(called SS-ClusterTree), which is based on subspace clustering. The SS-ClusterTree includes two kinds of leaf nodes, a cluster leaf node and a noise leaf node. When a new data item is added to the SS-ClusterTree, if it belongs to a cluster, it is inserted into the corresponding the cluster leaf node, otherwise into the noise leaf node. The noise leaf node will be split while its volume is more than a certain threshold. We present a novel updating technique which optimizes the internal structure of the SS-ClusterTree by utilizing the Newton's Universal Law of Gravitation. When a noise node is split, the attraction forces are calculated between every new node and its sibling nodes. These new nodes may be merged by their sibling nodes, if the attraction force between them is the most significant. Meanwhile the nodes intersecting boundaries are updated. This approach guarantees that the SS-ClusterTree always represents the current dataset structure, and helps to identify the pattern hiding in the newly added data. SS-ClusterTree can efficiently support the dynamic insertion and manage the dataset with non-clustering data, and is highly adaptive to any kind of cluster structure. Our experiment results also show that this index structure is effective and efficient.