A unique property of single-link distance and its application in data clustering

Authors:
Yuqing Song;Shuyuan Jin;Jie Shen
Affiliations:
Tianjin University of Technology and Education, 1310 Dagu South Road, Hexi District, Tianjin 300222, China;Institute of Computing Technology, Chinese Academy of Sciences, China;Department of Computer & Information Science, University of Michigan, Dearborn, United States
Venue:
Data & Knowledge Engineering
Year:
2011

Citing 15
Cited 1

A Validity Measure for Fuzzy Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A new cluster validity index for the fuzzy c-mean

Pattern Recognition Letters
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
BIRCH: A New Data Clustering Algorithm and Its Applications

Data Mining and Knowledge Discovery
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
Cluster validation techniques for genome expression data

Signal Processing - Special issue: Genomic signal processing
A local-density based spatial clustering algorithm with noise

Information Systems
A clustering-based approach for discovering interesting places in trajectories

Proceedings of the 2008 ACM symposium on Applied computing
A novel keyword search paradigm in relational databases: Object summaries

Data & Knowledge Engineering
Document clustering using synthetic cluster prototypes

Data & Knowledge Engineering
Estimating PageRank on graph streams

Journal of the ACM (JACM)
A log-linear approach to mining significant graph-relational patterns

Data & Knowledge Engineering
DeLi-Clu: boosting robustness, completeness, usability, and efficiency of hierarchical clustering by a closest pair ranking

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Some new indexes of cluster validity

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

SBV-Cut: Vertex-cut based graph partitioning using structural balance vertices

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

We prove a unique property of single-link distance, based on which an algorithm is designed for data clustering. The property states that a single-link cluster is a subset with inter-subset distance greater than intra-subset distance, and vice versa. Among the major linkages (single, complete, average, centroid, median, and Ward's), only single-link distance has this property. Based on this property we introduce monotonic sequences of iclusters (i.e., single-link clusters) to model the phenomenon that a natural cluster has a dense kernel and the density decreases as we move from the kernel to the boundary. A monotonic sequence of iclusters is a sequence of nested iclusters such that an icluster in the sequence is a dominant child (in terms of size) of the icluster before it. Our data clustering algorithm is monotonic sequence based. We classify a dataset of one monotonic sequence into to two classes by splitting the sequence into two parts: the kernel part and the surrounding part. For a data set of multiple monotonic sequences, each leaf monotonic sequence represents the kernel of a class, which then ''grows'' by absorbing nearby non-kernel points. This algorithm, proved by experiments, compares favorable in effectiveness to other clustering algorithms.