A New Nonparametric Pairwise Clustering Algorithm Based on Iterative Estimation of Distance Profiles

Authors:
Shlomo Dubnov;Ran El-Yaniv;Yoram Gdalyahu;Elad Schneidman;Naftali Tishby;Golan Yona
Affiliations:
Department of Communication Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel. dubnov@bgumail.bgu.ac.il;Department of Computer Science, Technion—Israel Institute of Technology, Haifa 32000, Israel. rani@cs.technion.ac.il;School of Computer Science and Engineering and Center for Neural Computation, Hebrew University, Jerusalem 91904, Israel. yoram@cs.huji.ac.il;School of Computer Science and Engineering, Department of Neurobiology and Center for Neural Computation, Hebrew University, Jerusalem 91904, Israel. elads@cs.huji.ac.il;School of Computer Science and Engineering and Center for Neural Computation, Hebrew University, Jerusalem 91904, Israel. tishby@cs.huji.ac.il;Department of Computer Science, Cornell University, Ithaca, NY 14853-7501, USA. golan@cs.cornell.edu
Venue:
Machine Learning - Special issue: Unsupervised learning
Year:
2002

Citing 14
Cited 10

Algorithms for clustering data

Algorithms for clustering data
Spoken letter recognition

NIPS-3 Proceedings of the 1990 conference on Advances in neural information processing systems 3
Clustering data by melting

Neural Computation
Agnostic classification of Markovian sequences

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Authoritative sources in a hyperlinked environment

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
A randomized algorithm for pairwise clustering

Proceedings of the 1998 conference on Advances in neural information processing systems II
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Threshold Validity for Mutual Neighborhood Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
An Optimal Graph Theoretic Approach to Data Clustering: Theory and Its Application to Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Factorization Approach to Grouping

ECCV '98 Proceedings of the 5th European Conference on Computer Vision-Volume I - Volume I
Normalized Cuts and Image Segmentation

CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Pairwise Data Clustering by Deterministic Annealing

Pairwise Data Clustering by Deterministic Annealing
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Iterative optimization and simplification of hierarchical clusterings

Journal of Artificial Intelligence Research

Distributional Scaling: An Algorithm for Structure-Preserving Embedding of Metric and Nonmetric Spaces

The Journal of Machine Learning Research
Relaxational metric adaptation and its application to semi-supervised clustering and content-based image retrieval

Pattern Recognition
Clustering people according to their preference criteria

Expert Systems with Applications: An International Journal
Considerations for Real-Time Spatially-Aware Case-Based Reasoning: A Case Study in Robotic Soccer Imitation

ECCBR '08 Proceedings of the 9th European conference on Advances in Case-Based Reasoning
Non-negative matrix factorization for semi-supervised data clustering

Knowledge and Information Systems
Similarity-based clustering of sequences using hidden Markov models

MLDM'03 Proceedings of the 3rd international conference on Machine learning and data mining in pattern recognition
Soft topographic map for clustering and classification of bacteria

IDA'07 Proceedings of the 7th international conference on Intelligent data analysis
A kernel based method for discovering market segments in beef meat

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Learning the reasons why groups of consumers prefer some food products

ICDM'06 Proceedings of the 6th Industrial Conference on Data Mining conference on Advances in Data Mining: applications in Medicine, Web Mining, Marketing, Image and Signal Mining
Information theoretic pairwise clustering

SIMBAD'13 Proceedings of the Second international conference on Similarity-Based Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a novel pairwise clustering method. Given a proximity matrix of pairwise relations (i.e. pairwise similarity or dissimilarity estimates) between data points, our algorithm extracts the two most prominent clusters in the data set. The algorithm, which is completely nonparametric, iteratively employs a two-step transformation on the proximity matrix. The first step of the transformation represents each point by its relation to all other data points, and the second step re-estimates the pairwise distances using a statistically motivated proximity measure on these representations. Using this transformation, the algorithm iteratively partitions the data points, until it finally converges to two clusters. Although the algorithm is simple and intuitive, it generates a complex dynamics of the proximity matrices. Based on this bipartition procedure we devise a hierarchical clustering algorithm, which employs the basic bipartition algorithm in a straightforward divisive manner. The hierarchical clustering algorithm copes with the model validation problem using a general cross-validation approach, which may be combined with various hierarchical clustering methods.We further present an experimental study of this algorithm. We examine some of the algorithm's properties and performance on some synthetic and ‘standard’ data sets. The experiments demonstrate the robustness of the algorithm and indicate that it generates a good clustering partition even when the data is noisy or corrupted.