k-ANMI: A mutual information based clustering algorithm for categorical data

Authors:
Zengyou He;Xiaofei Xu;Shengchun Deng
Affiliations:
Department of Computer Science and Engineering, Harbin Institute of Technology, 92 West Dazhi Street, P.O. Box 315, 150001, PR China;Department of Computer Science and Engineering, Harbin Institute of Technology, 92 West Dazhi Street, P.O. Box 315, 150001, PR China;Department of Computer Science and Engineering, Harbin Institute of Technology, 92 West Dazhi Street, P.O. Box 315, 150001, PR China
Venue:
Information Fusion
Year:
2008

Citing 26
Cited 7

Symbolic clustering using a new dissimilarity measure

Pattern Recognition
CACTUS—clustering categorical data using summaries

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering transactions using large items

Proceedings of the eighth international conference on Information and knowledge management
ROCK: a robust clustering algorithm for categorical attributes

Information Systems
COOLCAT: an entropy-based algorithm for categorical clustering

Proceedings of the eleventh international conference on Information and knowledge management
An iterative initial-points refinement algorithm for categorical data clustering

Pattern Recognition Letters
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
Discretization: An Enabling Technique

Data Mining and Knowledge Discovery
Squeezer: an efficient algorithm for clustering categorical data

Journal of Computer Science and Technology
Clustering Transactional Data

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Clustering Categorical Data: An Approach Based on Dynamical Systems

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
An Efficient Clustering Algorithm for Market Basket Data Based on Small Large Ratios

COMPSAC '01 Proceedings of the 25th International Computer Software and Applications Conference on Invigorating Software Development
Clustering Large Categorical Data

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
CLOPE: a fast and effective clustering algorithm for transactional data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Cluster ensembles: a knowledge reuse framework for combining partitionings

Eighteenth national conference on Artificial intelligence
Caucus-based Transaction Clustering

DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
Using Category-Based Adherence to Cluster Market-Basket Data

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Clustering and its validation in a symbolic framework

Pattern Recognition Letters
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Relationship-based clustering and cluster ensembles for high-dimensional data mining

Relationship-based clustering and cluster ensembles for high-dimensional data mining
Entropy-based criterion in categorical clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Multivalued type proximity measure and concept of mutual similarity value useful for clustering symbolic patterns

Pattern Recognition Letters
Clustering Aggregation

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Categorical data visualization and clustering using subjective factors

Data & Knowledge Engineering
TCSOM: Clustering Transactions Using Self-Organizing Map

Neural Processing Letters
A fuzzy k-modes algorithm for clustering categorical data

IEEE Transactions on Fuzzy Systems

G-ANMI: A mutual information based genetic clustering algorithm for categorical data

Knowledge-Based Systems
Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance

The Journal of Machine Learning Research
CPCQ: Contrast pattern based clustering quality index for categorical data

Pattern Recognition
Adjusting the clustering results referencing an external set

ICSI'10 Proceedings of the First international conference on Advances in Swarm Intelligence - Volume Part II
Privacy protection of textual attributes through a semantic-based masking method

Information Fusion
An automatic approach for ontology-based feature extraction from heterogeneous textualresources

Engineering Applications of Artificial Intelligence
An improved genetic clustering algorithm for categorical data

PAKDD'12 Proceedings of the 2012 Pacific-Asia conference on Emerging Trends in Knowledge Discovery and Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering categorical data is an integral part of data mining and has attracted much attention recently. In this paper, we present k-ANMI, a new efficient algorithm for clustering categorical data. The k-ANMI algorithm works in a way that is similar to the popular k-means algorithm, and the goodness of clustering in each step is evaluated using a mutual information based criterion (namely, average normalized mutual information - ANMI) borrowed from cluster ensemble. This algorithm is easy to implement, requiring multiple hash tables as the only major data structure. Experimental results on real datasets show that k-ANMI algorithm is competitive with those state-of-the-art categorical data clustering algorithms with respect to clustering accuracy.