Minimum Entropy Clustering and Applications to Gene Expression Analysis

Authors:
Haifeng Li;Keshu Zhang;Tao Jiang
Affiliations:
University of California at Riverside;Rensselaer Polytechnic Institute;University of California at Riverside
Venue:
CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Year:
2004

Citing 7
Cited 11

Algorithms for clustering data

Algorithms for clustering data
Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
Elements of information theory

Elements of information theory
An introduction to Kolmogorov complexity and its applications (2nd ed.)

An introduction to Kolmogorov complexity and its applications (2nd ed.)
Data mining: concepts and techniques

Data mining: concepts and techniques
Self-Organizing Maps

Self-Organizing Maps
Cluster Analysis

Cluster Analysis

Clustering by kernel density

Computational Economics
Enhancing the Effectiveness of Clustering with Spectra Analysis

IEEE Transactions on Knowledge and Data Engineering
LEGClust—A Clustering Algorithm Based on Layered Entropic Subgraphs

IEEE Transactions on Pattern Analysis and Machine Intelligence
Techniques for clustering gene expression data

Computers in Biology and Medicine
Extending the rand, adjusted rand and jaccard indices to fuzzy partitions

Journal of Intelligent Information Systems
A method of relational fuzzy clustering based on producing feature vectors using FastMap

Information Sciences: an International Journal
A fuzzy ontological knowledge document clustering methodology

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Unsupervised model adaptation using information-theoretic criterion

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Determining subunits for sign language recognition by evolutionary cluster-based segmentation of time series

ICAISC'10 Proceedings of the 10th international conference on Artifical intelligence and soft computing: Part II
Detecting fraud in online games of chance and lotteries

Expert Systems with Applications: An International Journal
Spatial autocorrelation-based information visualization evaluation

Proceedings of the 2012 BELIV Workshop: Beyond Time and Errors - Novel Evaluation Methods for Visualization

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering is a common methodology for analyzing the gene expression data. In this paper, we present a new clustering algorithm from an information-theoretic point of view. First, we propose the minimum entropy (measured on a posteriori probabilities) criterion, which is the conditional entropy of clusters given the observations. Fanoýs inequality indicates that it could be a good criterion for clustering. We generalize the criterion by replacing Shannonýs entropy with Havrda-Charvatýs structural 驴-entropy. Interestingly, the minimum entropy criterion based on structural á-entropy is equal to the probability error of the nearest neighbor method when 驴 = 2. This is another evidence that the proposed criterion is good for clustering. With a non-parametric approach for estimating a posteriori probabilities, an efficient iterative algorithm is then established to minimize the entropy. The experimental results show that the clustering algorithm performs significantly better than k-means/medians, hierarchical clustering, SOM, and EM in terms of adjusted Rand index. Particularly, our algorithm performs very well even when the correct number of clusters is unknown. In addition, most clustering algorithms produce poor partitions in presence of outliers while our method can correctly reveal the structure of data and effectively identify outliers simultaneously.