BASSUM: A Bayesian semi-supervised method for classification feature selection

Authors:
Ruichu Cai;Zhenjie Zhang;Zhifeng Hao
Affiliations:
Faculty of Computer Science, Guangdong University of Technology, 510006 Guangzhou, PR China;School of Computing, National University of Singapore, Singapore 117417, Singapore;Faculty of Computer Science, Guangdong University of Technology, 510006 Guangzhou, PR China
Venue:
Pattern Recognition
Year:
2011

Citing 9
Cited 3

A Bayesian Method for the Induction of Probabilistic Networks from Data

Machine Learning
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Machine Learning

Machine Learning
Gene Selection for Cancer Classification using Support Vector Machines

Machine Learning
Time and sample efficient discovery of Markov blankets and direct causal relations

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
LS Bound based gene selection for DNA microarray data

Bioinformatics
Spectral feature selection for supervised and unsupervised learning

Proceedings of the 24th international conference on Machine learning
Structured machine learning: the next ten years

Machine Learning
Feature selection with dynamic mutual information

Pattern Recognition

Causal gene identification using combinatorial V-structure search

Neural Networks
A graph Laplacian based approach to semi-supervised feature selection for regression problems

Neurocomputing
Software project risk analysis using Bayesian networks with causality constraints

Decision Support Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

Feature selection is an important preprocessing step for building efficient, generalizable and interpretable classifiers on high dimensional data sets. Given the assumption on the sufficient labelled samples, the Markov Blanket provides a complete and sound solution to the selection of optimal features, by exploring the conditional independence relationships among the features. In real-world applications, unfortunately, it is usually easy to get unlabelled samples, but expensive to obtain the corresponding accurate labels on the samples. This leads to the potential waste of valuable classification information buried in unlabelled samples. In this paper, we propose a new BAyesian Semi-SUpervised Method, or BASSUM in short, to exploit the values of unlabelled samples on classification feature selection problem. Generally speaking, the inclusion of unlabelled samples helps the feature selection algorithm on (1) pinpointing more specific conditional independence tests involving fewer variable features and (2) improving the robustness of individual conditional independence tests with additional statistical information. Our experimental results show that BASSUM enhances the efficiency of traditional feature selection methods and overcomes the difficulties on redundant features in existing semi-supervised solutions.