Mutual information evaluation: A way to predict the performance of feature weighting on clustering

Authors:
Bo Ji;Yang-Dong Ye;Yu Xiao
Affiliations:
School of Information Engineering, Zhengzhou University, Zhengzhou, Henan, China;School of Information Engineering, Zhengzhou University, Zhengzhou, Henan, China and The State Key Laboratory of Train Traffic Control and Safety, Beijing Jiaotong University, Beijing, China;Business School, Hohai University, Nanjing, China
Venue:
Intelligent Data Analysis
Year:
2013

Citing 20
Cited 0

Algorithms for clustering data

Algorithms for clustering data
A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features

Machine Learning
Data clustering: a review

ACM Computing Surveys (CSUR)
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature extraction by non parametric mutual information maximization

The Journal of Machine Learning Research
Supervised term weighting for automated text categorization

Proceedings of the 2003 ACM symposium on Applied computing
Information-theoretic co-clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy

IEEE Transactions on Pattern Analysis and Machine Intelligence
Multiple-attribute decision making methods for plant layout design problem

Robotics and Computer-Integrated Manufacturing
An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data

IEEE Transactions on Knowledge and Data Engineering
Developing a feature weight self-adjustment mechanism for a K-means clustering algorithm

Computational Statistics & Data Analysis
PID-Based Feature Weight Learning and Its Application in Intrusion Detection

CSIE '09 Proceedings of the 2009 WRI World Congress on Computer Science and Information Engineering - Volume 05
Multiple attribute decision making based on fuzzy preference information on alternatives: Ranking and weighting

Fuzzy Sets and Systems
Data clustering: 50 years beyond K-means

Pattern Recognition Letters
Learning and generalization with the information bottleneck

Theoretical Computer Science
A feature group weighting method for subspace clustering of high-dimensional data

Pattern Recognition
Robust data clustering by learning multi-metric Lq-norm distances

Expert Systems with Applications: An International Journal
Feature subset selection wrapper based on mutual information and rough sets

Expert Systems with Applications: An International Journal
A method for combining mutual information and canonical correlation analysis: Predictive Mutual Information and its use in feature selection

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Feature weighting is one of the popular and effective ways to improve clustering quality. How to choose a proper weighting method for a data object is widely recognized as a difficult problem. Among majority of weighting schemes and combination weighting methods, the traditional way is evaluating the performance of feature weighting by measuring the quality of clustering. However, it is a time-consuming task because clustering algorithms should be run many times, and the number of times depends on the number of weighting schemes or the number of combination weighting iteration. To address the issue, we propose to apply the Mutual Information to predict the performance of feature weighting. We propose to judge the quality of feature weighting by the resulting gain in mutual information. Therefore, the top s weighted data representations can be selected from the weighting data representation set. Then, the best/second best cluster result can be obtained from the top s representations. Experimental results show that the Mutual Information evaluation reduces the running time without sacrificing the quality of clustering.