Kernel k-means for categorical data

Authors:
Julia Couto
Affiliations:
James Madison University, Harrisonburg, VA
Venue:
IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis
Year:
2005

Citing 14
Cited 3

Algorithms for clustering data

Algorithms for clustering data
Fast and effective text mining using linear-time document clustering

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
CACTUS—clustering categorical data using summaries

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
ROCK: a robust clustering algorithm for categorical attributes

Information Systems
COOLCAT: an entropy-based algorithm for categorical clustering

Proceedings of the eleventh international conference on Information and knowledge management
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
On Clustering Validation Techniques

Journal of Intelligent Information Systems
Diffusion Kernels on Graphs and Other Discrete Input Spaces

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Clustering Categorical Data: An Approach Based on Dynamical Systems

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
A Large Scale Clustering Scheme for Kernel K-Means

ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 4 - Volume 4
Text classification using string kernels

The Journal of Machine Learning Research
Support vector clustering

The Journal of Machine Learning Research
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
Mercer kernel-based clustering in feature space

IEEE Transactions on Neural Networks

SpectralCAT: Categorical spectral clustering of numerical and nominal data

Pattern Recognition
Similarity kernels for nearest neighbor-based outlier detection

IDA'10 Proceedings of the 9th international conference on Advances in Intelligent Data Analysis
Semantic Pattern Transformation: Applying Knowledge Discovery Processes in Heterogeneous Domains

Proceedings of the 13th International Conference on Knowledge Management and Knowledge Technologies

Quantified Score

Hi-index	0.01

Visualization

Abstract

Clustering categorical data is an important and challenging data analysis task. In this paper, we explore the use of kernel K-means to cluster categorical data. We propose a new kernel function based on Hamming distance to embed categorical data in a constructed feature space where the clustering is conducted. We experimentally evaluated the quality of the solutions produced by kernel K-means on real datasets. Results indicated the feasibility of kernel K-means using our proposed kernel function to discover clusters embedded in categorical data.