G-ANMI: A mutual information based genetic clustering algorithm for categorical data

Authors:
Shengchun Deng;Zengyou He;Xiaofei Xu
Affiliations:
Department of Computer Science and Engineering, Harbin Institute of Technology, Harbin, China;Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China;Department of Computer Science and Engineering, Harbin Institute of Technology, Harbin, China
Venue:
Knowledge-Based Systems
Year:
2010

Citing 8
Cited 8

Adaptation in natural and artificial systems

Adaptation in natural and artificial systems
Data clustering: a review

ACM Computing Surveys (CSUR)
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
Squeezer: an efficient algorithm for clustering categorical data

Journal of Computer Science and Technology
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
TCSOM: Clustering Transactions Using Self-Organizing Map

Neural Processing Letters
On the Impact of Dissimilarity Measure in k-Modes Clustering Algorithm

IEEE Transactions on Pattern Analysis and Machine Intelligence
k-ANMI: A mutual information based clustering algorithm for categorical data

Information Fusion

Mining associative classification rules with stock trading data - A GA-based method

Knowledge-Based Systems
A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data

Knowledge-Based Systems
A new grouping genetic algorithm for clustering problems

Expert Systems with Applications: An International Journal
Wavelet feature extraction and genetic algorithm for biomarker detection in colorectal cancer data

Knowledge-Based Systems
Using a contextual entropy model to expand emotion words and their intensity for the sentiment classification of stock market news

Knowledge-Based Systems
An improved genetic clustering algorithm for categorical data

PAKDD'12 Proceedings of the 2012 Pacific-Asia conference on Emerging Trends in Knowledge Discovery and Data Mining
MAR: Maximum Attribute Relative of soft set for clustering attribute selection

Knowledge-Based Systems
Simultaneous feature selection and clustering with mixed features by multi objective genetic algorithm

International Journal of Hybrid Intelligent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Identification of meaningful clusters from categorical data is one key problem in data mining. Recently, Average Normalized Mutual Information (ANMI) has been used to define categorical data clustering as an optimization problem. To find globally optimal or near-optimal partition determined by ANMI, a genetic clustering algorithm (G-ANMI) is proposed in this paper. Experimental results show that G-ANMI is superior or comparable to existing algorithms for clustering categorical data in terms of clustering accuracy.