A new initialization method for categorical data clustering

  • Authors:
  • Fuyuan Cao;Jiye Liang;Liang Bai

  • Affiliations:
  • Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Taiyuan 030006, China and School of Computer and Information Technology, Shanxi University ...;Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Taiyuan 030006, China and School of Computer and Information Technology, Shanxi University ...;School of Computer and Information Technology, Shanxi University, Taiyuan 030006, Shanxi, China

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2009

Quantified Score

Hi-index 12.05

Visualization

Abstract

In clustering algorithms, choosing a subset of representative examples is very important in data set. Such ''exemplars'' can be found by randomly choosing an initial subset of data objects and then iteratively refining it, but this works well only if that initial choice is close to a good solution. In this paper, based on the frequency of attribute values, the average density of an object is defined. Furthermore, a novel initialization method for categorical data is proposed, in which the distance between objects and the density of the object is considered. We also apply the proposed initialization method to k-modes algorithm and fuzzy k-modes algorithm. Experimental results illustrate that the proposed initialization method is superior to random initialization method and can be applied to large data sets for its linear time complexity with respect to the number of data objects.