R-map: mapping categorical data for clustering and visualization based on reference sets

  • Authors:
  • Zhi-Yong Shen;Jun Sun;Yi-Dong Shen;Ming Li

  • Affiliations:
  • Lab. of Computer Science, Institute of Software, Chinese Academy of Sciences, China;Lab. of Computer Science, Institute of Software, Chinese Academy of Sciences, China;Lab. of Computer Science, Institute of Software, Chinese Academy of Sciences, China;Department of Epidemiology, Michigan State University

  • Venue:
  • PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose a framework that maps categorical data into a numerical data space via a reference set, aiming to make the existing numerical clustering algorithms directly applicable on the generated image data set as well as to visualize the data. Using statistics theories, we analyze our framework and give the conditions under which the data mapping is efficient and yet preserves a flexible property of the original data, i.e. the data points within the same cluster are more similar. The algorithm is simple and has good effectiveness under some conditions. The experimental evaluation on numerous categorical data sets shows that it not only outperforms the related data mapping approaches but also beats some categorical clustering algorithms in terms of effectiveness and efficiency.