A k-mean clustering algorithm for mixed numeric and categorical data

  • Authors:
  • Amir Ahmad;Lipika Dey

  • Affiliations:
  • Solid State Physics Laboratory, Timarpur, Delhi 110 054, India;Department of Mathematics, IIT Delhi, Hauz Khas, New Delhi 110 016, India

  • Venue:
  • Data & Knowledge Engineering
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Use of traditional k-mean type algorithm is limited to numeric data. This paper presents a clustering algorithm based on k-mean paradigm that works well for data with mixed numeric and categorical features. We propose new cost function and distance measure based on co-occurrence of values. The measures also take into account the significance of an attribute towards the clustering process. We present a modified description of cluster center to overcome the numeric data only limitation of k-mean algorithm and provide a better characterization of clusters. The performance of this algorithm has been studied on real world data sets. Comparisons with other clustering algorithms illustrate the effectiveness of this approach.