GEVA: geometric variability-based approaches for identifying patterns in data

  • Authors:
  • Itziar Irigoien;Concepcion Arenas;Elena Fernández;Francisco Mestres

  • Affiliations:
  • Euskal Herriko Unibertsitatea UPV-EHU, Department of Computation and Artificial Intelligence, Donostia, Spain;Universitat de Barcelona, Departament d’Estadística, Facultat de Biologia, Diagonal 645, 08028, Barcelona, Spain;Universitat Politècnica de Catalunya, Departament d’Estadística e Investigació Operativa, Barcelona, Spain;Universitat de Barcelona, Departament de Genètica, Barcelona, Spain

  • Venue:
  • Computational Statistics
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper, arising from population studies, develops clustering algorithms for identifying patterns in data. Based on the concept of geometric variability, we have developed one polythetic-divisive and three agglomerative algorithms. The effectiveness of these procedures is shown by relating them to classical clustering algorithms. They are very general since they do not impose constraints on the type of data, so they are applicable to general (economics, ecological, genetics...) studies. Our major contributions include a rigorous formulation for novel clustering algorithms, and the discovery of new relationship between geometric variability and clustering. Finally, these novel procedures give a theoretical frame with an intuitive interpretation to some classical clustering methods to be applied with any type of data, including mixed data. These approaches are illustrated with real data on Drosophila chromosomal inversions.