Clustering and labeling of multi-dimensional mixed structured data

  • Authors:
  • Marco Brambilla;Massimiliano Zanoni

  • Affiliations:
  • Dipartimento di Elettronica e Informazione, Politecnico di Milano, Milano, Italy;Dipartimento di Elettronica e Informazione, Politecnico di Milano, Milano, Italy

  • Venue:
  • Search Computing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Cluster Analysis consists of the aggregation of data items of a given set into subsets based on some similarity properties. Clustering techniques have been applied in many fields which typically involve a large amount of complex data. This study focuses on what we call multi-domain clustering and labeling, i.e. a set of techniques for multi-dimensional structured mixed data clustering. The work consists of studying the best mix of clustering techniques that address the problem in the multi-domain setting. Considered data types are numerical, categorical and textual. All of them can appear together within the same clustering scenario. We focus on k-means and agglomerative hierarchical clustering methods based on a new distance function we define for this specific setting. The proposed approach has been validated on some real and realistic data-sets based onto college, automobile and leisure fields. Experimental data allowed to evaluate the effectiveness of the different solutions, both for clustering and labeling.