Data Reduction Method for Categorical Data Clustering

Authors:
Eréndira Rendón;J. Salvador Sánchez;Rene A. Garcia;Itzel Abundez;Citlalih Gutierrez;Eduardo Gasca
Affiliations:
Lab. Reconocimiento de Patrones, Instituto Tecnológico de Toluca, Metepec, (México) 52140;Dept. Llenguatges i Sistemes Informàtics, Universitat Jaume I, Castelló de la Plana, (Spain) E-12071;Lab. Reconocimiento de Patrones, Instituto Tecnológico de Toluca, Metepec, (México) 52140;Lab. Reconocimiento de Patrones, Instituto Tecnológico de Toluca, Metepec, (México) 52140;Lab. Reconocimiento de Patrones, Instituto Tecnológico de Toluca, Metepec, (México) 52140;Lab. Reconocimiento de Patrones, Instituto Tecnológico de Toluca, Metepec, (México) 52140
Venue:
IBERAMIA '08 Proceedings of the 11th Ibero-American conference on AI: Advances in Artificial Intelligence
Year:
2008

Citing 7
Cited 1

Algorithms for clustering data

Algorithms for clustering data
Symbolic clustering using a new dissimilarity measure

Pattern Recognition
CACTUS—clustering categorical data using summaries

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
COOLCAT: an entropy-based algorithm for categorical clustering

Proceedings of the eleventh international conference on Information and knowledge management
ROCK: A Robust Clustering Algorithm for Categorical Attributes

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
CLICKS: an effective algorithm for mining subspace clusters in categorical datasets

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Clustering based on compressed data for categorical and mixed attributes

SSPR'06/SPR'06 Proceedings of the 2006 joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition

Sampling correctly for improving classification accuracy: a hybrid higher order neural classifier (HHONC) approach

Proceedings of the International Conference on Advances in Computing, Communications and Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Categorical data clustering constitutes an important part of data mining; its relevance has recently drawn attention from several researchers. As a step in data mining, however, clustering encounters the problem of large amount of data to be processed. This article offers a solution for categorical clustering algorithms when working with high volumes of data by means of a method that summarizes the database. This is done using a structure called CM-tree. In order to test our method, the K-Modes and Click clustering algorithms were used with several databases. Experiments demonstrate that the proposed summarization method improves execution time, without losing clustering quality.