A Novel Scalable and Data Efficient Feature Subset Selection Algorithm

Authors:
Sergio Rodrigues De Morais;Alex Aussem
Affiliations:
INSA-Lyon, LIESP, Villeurbanne, France F-69622;Université de Lyon 1, LIESP, Villeurbanne, France F-69622
Venue:
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Year:
2008

Citing 10
Cited 8

KDD Cup 2001 report

ACM SIGKDD Explorations Newsletter
An introduction to variable and feature selection

The Journal of Machine Learning Research
Speculative Markov Blanket Discovery for Optimal Feature Selection

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
The max-min hill-climbing Bayesian network structure learning algorithm

Machine Learning
Learning Bayesian Networks

Learning Bayesian Networks
Consistent Feature Selection for Pattern Recognition in Polynomial Time

The Journal of Machine Learning Research
Towards scalable and data efficient learning of Markov boundaries

International Journal of Approximate Reasoning
Nasopharyngeal Carcinoma Data Analysis with a Novel Bayesian Network Skeleton Learning Algorithm

AIME '07 Proceedings of the 11th conference on Artificial Intelligence in Medicine
Robust independence testing for constraint-based learning of causal structure

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Learning bayesian networks in semi-deterministic systems

AI'06 Proceedings of the 19th international conference on Advances in Artificial Intelligence: Canadian Society for Computational Studies of Intelligence

Robust Gene Selection from Microarray Data with a Novel Markov Boundary Learning Method: Application to Diabetes Analysis

ECSQARU '09 Proceedings of the 10th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty
Graph-Based Analysis of Nasopharyngeal Carcinoma with Bayesian Network Learning Methods

GbRPR '09 Proceedings of the 7th IAPR-TC-15 International Workshop on Graph-Based Representations in Pattern Recognition
Exploiting Data Missingness in Bayesian Network Modeling

IDA '09 Proceedings of the 8th International Symposium on Intelligent Data Analysis: Advances in Intelligent Data Analysis VIII
Selecting and Weighting Data for Building Consensus Gene Regulatory Networks

IDA '09 Proceedings of the 8th International Symposium on Intelligent Data Analysis: Advances in Intelligent Data Analysis VIII
Incremental Bayesian Network Learning for Scalable Feature Selection

IDA '09 Proceedings of the 8th International Symposium on Intelligent Data Analysis: Advances in Intelligent Data Analysis VIII
An efficient and scalable algorithm for local Bayesian network structure discovery

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
iMMPC: a local search approach for incremental Bayesian network structure learning

IDA'11 Proceedings of the 10th international conference on Advances in intelligent data analysis X
Score-based methods for learning Markov boundaries by searching in constrained spaces

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.02

Visualization

Abstract

In this paper, we aim to identify the minimal subset of discrete random variables that is relevant for probabilistic classification in data sets with many variables but few instances. A principled solution to this problem is to determine the Markov boundaryof the class variable. Also, we present a novel scalable, data efficient and correct Markov boundary learning algorithm under the so-called faithfulnesscondition. We report extensive empiric experiments on synthetic and real data sets scaling up to 139,351 variables.