Statistical modeling of dissimilarity increments for d-dimensional data: Application in partitional clustering

  • Authors:
  • Helena Aidos;Ana Fred

  • Affiliations:
  • Instituto de Telecomunicaçíes, Instituto Superior Técnico, Lisbon, Portugal;Instituto de Telecomunicaçíes, Instituto Superior Técnico, Lisbon, Portugal

  • Venue:
  • Pattern Recognition
  • Year:
  • 2012

Quantified Score

Hi-index 0.01

Visualization

Abstract

This paper addresses the use of high order dissimilarity models in data mining problems. We explore dissimilarities between triplets of nearest neighbors, called dissimilarity increments (DIs). We derive a statistical model of DIs for d-dimensional data (d-DID) assuming that the objects follow a multivariate Gaussian distribution. Empirical evidence shows that the d-DID is well approximated by the particular case d=2. We propose the application of this model in clustering, with a partitional algorithm that uses a merge strategy on Gaussian components. Experimental results, in synthetic and real datasets, show that clustering algorithms using DID usually outperform well known clustering algorithms.