Data Reduction Method for Categorical Data Clustering

  • Authors:
  • Eréndira Rendón;J. Salvador Sánchez;Rene A. Garcia;Itzel Abundez;Citlalih Gutierrez;Eduardo Gasca

  • Affiliations:
  • Lab. Reconocimiento de Patrones, Instituto Tecnológico de Toluca, Metepec, (México) 52140;Dept. Llenguatges i Sistemes Informàtics, Universitat Jaume I, Castelló de la Plana, (Spain) E-12071;Lab. Reconocimiento de Patrones, Instituto Tecnológico de Toluca, Metepec, (México) 52140;Lab. Reconocimiento de Patrones, Instituto Tecnológico de Toluca, Metepec, (México) 52140;Lab. Reconocimiento de Patrones, Instituto Tecnológico de Toluca, Metepec, (México) 52140;Lab. Reconocimiento de Patrones, Instituto Tecnológico de Toluca, Metepec, (México) 52140

  • Venue:
  • IBERAMIA '08 Proceedings of the 11th Ibero-American conference on AI: Advances in Artificial Intelligence
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Categorical data clustering constitutes an important part of data mining; its relevance has recently drawn attention from several researchers. As a step in data mining, however, clustering encounters the problem of large amount of data to be processed. This article offers a solution for categorical clustering algorithms when working with high volumes of data by means of a method that summarizes the database. This is done using a structure called CM-tree. In order to test our method, the K-Modes and Click clustering algorithms were used with several databases. Experiments demonstrate that the proposed summarization method improves execution time, without losing clustering quality.