ClusterTree: Integration of Cluster Representation and Nearest-Neighbor Search for Large Data Sets with High Dimensions

Authors:
Dantong Yu;Aidong Zhang
Affiliations:
-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2003

Citing 23
Cited 11

Introduction to algorithms

Introduction to algorithms
The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
On packing R-trees

CIKM '93 Proceedings of the second international conference on Information and knowledge management
Nearest neighbor queries

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Texture Features for Browsing and Retrieval of Image Data

IEEE Transactions on Pattern Analysis and Machine Intelligence
The SR-tree: an index structure for high-dimensional nearest neighbor queries

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The pyramid-technique: towards breaking the curse of dimensionality

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Optimal multi-step k-nearest neighbor search

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A comparative study of clustering methods

Future Generation Computer Systems - Special double issue on data mining
Semantic clustering and querying on heterogeneous features for visual data

MULTIMEDIA '98 Proceedings of the sixth ACM international conference on Multimedia
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Density-based indexing for approximate nearest-neighbor queries

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
A cost model for query processing in high dimensional data spaces

ACM Transactions on Database Systems (TODS)
The K-D-B-tree: a search structure for large multidimensional dynamic indexes

SIGMOD '81 Proceedings of the 1981 ACM SIGMOD international conference on Management of data
Similarity Indexing with the SS-tree

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
WaveCluster: a wavelet-based clustering approach for spatial data in very large databases

The VLDB Journal — The International Journal on Very Large Data Bases
The Hybrid Tree: An Index Structure for High Dimensional Feature Spaces

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Multidimensional indexing and management for large-scale databases

Multidimensional indexing and management for large-scale databases

Towards Exploring Interactive Relationship between Clusters and Outliers in Multi-Dimensional Data Analysis

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
A privacy preserving technique for distance-based classification with worst case privacy guarantees

Data & Knowledge Engineering
SS-ClusterTree: a subspace clustering based indexing algorithm over high-dimensional image features

CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
Context-sensitive queries for image retrieval in digital libraries

Journal of Intelligent Information Systems
Efficient Processing of Nearest Neighbor Queries in Parallel Multimedia Databases

DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
SubCOID: an attempt to explore cluster-outlier iterative detection approach to multi-dimensional data analysis in subspace

Proceedings of the 46th Annual Southeast Regional Conference on XX
An indexing approach for representing multimedia objects in high-dimensional spaces based on expectation maximization algorithm

MIS'05 Proceedings of the 11th international conference on Advances in Multimedia Information Systems
Efficient word retrieval by means of SOM clustering and PCA

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
On the usage of clustering for content based image retrieval

CSR'07 Proceedings of the Second international conference on Computer Science: theory and applications
A data allocation method for efficient content-based retrieval in parallel multimedia databases

ISPA'07 Proceedings of the 2007 international conference on Frontiers of High Performance Computing and Networking
GMM-ClusterForest: a novel indexing approach for multi-features based similarity search in high-dimensional spaces

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we introduce the ClusterTree, a new indexing approach to representing clusters generated by any existing clustering approach. A cluster is decomposed into several subclusters and represented as the union of the subclusters. The subclusters can be further decomposed, which isolates the most related groups within the clusters. A ClusterTree is a hierarchy of clusters and subclusters which incorporates the cluster representation into the index structure to achieve effective and efficient retrieval. Our cluster representation is highly adaptive to any kind of cluster. It is well accepted that most existing indexing techniques degrade rapidly as the dimensions increase. The ClusterTree provides a practical solution to index clustered data sets and supports the retrieval of the nearest-neighbors effectively without having to linearly scan the high-dimensional data set. We also discuss an approach to dynamically reconstruct the ClusterTree when new data is added. We present the detailed analysis of this approach and justify it extensively with experiments.