Statistical modeling of dissimilarity increments for d-dimensional data: Application in partitional clustering

Authors:
Helena Aidos;Ana Fred
Affiliations:
Instituto de Telecomunicaçíes, Instituto Superior Técnico, Lisbon, Portugal;Instituto de Telecomunicaçíes, Instituto Superior Técnico, Lisbon, Portugal
Venue:
Pattern Recognition
Year:
2012

Citing 20
Cited 2

Algorithms for clustering data

Algorithms for clustering data
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Concept Learning and Feature Selection Based on Square-Error Clustering

Machine Learning
Data clustering: a review

ACM Computing Surveys (CSUR)
Unsupervised Learning of Finite Mixture Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Finding Consistent Clusters in Data Partitions

MCS '01 Proceedings of the Second International Workshop on Multiple Classifier Systems
MDL-Based Selection of the Number of Components in Mixture Models for Pattern Classification

SSPR '98/SPR '98 Proceedings of the Joint IAPR International Workshops on Advances in Pattern Recognition
ROCK: A Robust Clustering Algorithm for Categorical Attributes

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
A New Cluster Isolation Criterion Based on Dissimilarity Increments

IEEE Transactions on Pattern Analysis and Machine Intelligence
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
EBEM: An Entropy-based EM Algorithm for Gaussian Mixture Models

ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 02
SMEM Algorithm for Mixture Models

Neural Computation
In search of deterministic methods for initializing K-means and Gaussian mixture clustering

Intelligent Data Analysis
NIST Handbook of Mathematical Functions

NIST Handbook of Mathematical Functions
On the distribution of dissimilarity increments

IbPRIA'11 Proceedings of the 5th Iberian conference on Pattern recognition and image analysis
Divergence measures based on the Shannon entropy

IEEE Transactions on Information Theory
Survey of clustering algorithms

IEEE Transactions on Neural Networks

k-nearest neighbor classification using dissimilarity increments

ICIAR'12 Proceedings of the 9th international conference on Image Analysis and Recognition - Volume Part I
Image annotation using high order statistics in non-Euclidean spaces

Journal of Visual Communication and Image Representation

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper addresses the use of high order dissimilarity models in data mining problems. We explore dissimilarities between triplets of nearest neighbors, called dissimilarity increments (DIs). We derive a statistical model of DIs for d-dimensional data (d-DID) assuming that the objects follow a multivariate Gaussian distribution. Empirical evidence shows that the d-DID is well approximated by the particular case d=2. We propose the application of this model in clustering, with a partitional algorithm that uses a merge strategy on Gaussian components. Experimental results, in synthetic and real datasets, show that clustering algorithms using DID usually outperform well known clustering algorithms.