Redundant feature elimination by using approximate Markov blanket based on discriminative contribution

Authors:
Xue-Qiang Zeng;Su-Fen Chen;Hua-Xing Zou
Affiliations:
Computer Center, Nanchang University, Nanchang, China;Department of Computer Science and Technology, Nanchang Institute of Technology, Nanchang, China;Computer Center, Nanchang University, Nanchang, China
Venue:
WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
Year:
2011

Citing 9
Cited 0

Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Efficient Feature Selection via Analysis of Relevance and Redundancy

The Journal of Machine Learning Research
Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy

IEEE Transactions on Pattern Analysis and Machine Intelligence
Evolving Feature Selection

IEEE Intelligent Systems
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Feature Selection for Gene Expression Using Model-Based Entropy

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

As a high dimensional problem, it is a hard task to analyze the text data sets, where many weakly relevant but redundant features hurt generalization performance of classifiers. There are previous works to handle this problem by using pair-wise feature similarities, which do not consider discriminative contribution of each feature by utilizing the label information. Here we define an Approximate Markov Blanket (AMB) based on the metric of DIScriminative Contribution (DISC) to eliminate redundant features and propose the AMB-DISC algorithm. Experimental results on the data set of Reuter-21578 show AMBDISC is much better than the previous state-of-arts feature selection algorithms considering feature redundancy in terms of MicroavgF1 and MacroavgF1.