Clustering categorical data using an extended modularity measure

Authors:
Lazhar Labiod;Nistor Grozavu;Younès Bennani
Affiliations:
LIPN-UMR, Université Paris 13, Villetaneuse, France;LIPN-UMR, Université Paris 13, Villetaneuse, France;LIPN-UMR, Université Paris 13, Villetaneuse, France
Venue:
ICONIP'10 Proceedings of the 17th international conference on Neural information processing: models and applications - Volume Part II
Year:
2010

Citing 5
Cited 2

CACTUS—clustering categorical data using summaries

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
ROCK: a robust clustering algorithm for categorical attributes

Information Systems
COOLCAT: an entropy-based algorithm for categorical clustering

Proceedings of the eleventh international conference on Information and knowledge management
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
Clustering Categorical Data: An Approach Based on Dynamical Systems

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases

GPU-Based biclustering for neural information processing

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part V
GPU-based biclustering for microarray data analysis in neurocomputing

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Newman and Girvan [12] recently proposed an objective function for graph clustering called the Modularity function which allows automatic selection of the number of clusters. Empirically, higher values of the Modularity function have been shown to correlate well with good graph clustering. In this paper we propose an extended Modularity measure for categorical data clustering; first, we establish the connection with the Relational Analysis criterion. The proposed Modularity measure introduces an automatic weighting scheme which takes in consideration the profile of each data object. A modified Relational Analysis algorithm is then presented to search for the partitions maximizing the criterion. This algorithm deals linearly with large data set and allows natural clusters identification, i.e. doesn't require fixing the number of clusters and size of each cluster. Experimental results indicate that the new algorithm is efficient and effective at finding both good clustering and the appropriate number of clusters across a variety of real-world data sets.