Learning with many irrelevant features

Authors:
Hussein Almuallim;Thomas G. Dietterich
Affiliations:
Department of Computer Science, Oregon State University, Corvallis, OR;Department of Computer Science, Oregon State University, Corvallis, OR
Venue:
AAAI'91 Proceedings of the ninth National conference on Artificial intelligence - Volume 2
Year:
1991

Citing 8
Cited 27

Occam's razor

Information Processing Letters
Learnability and the Vapnik-Chervonenkis dimension

Journal of the ACM (JACM)
Boolean Feature Discovery in Empirical Learning

Machine Learning
Learning DNF under the uniform distribution in quasi-polynomial time

COLT '90 Proceedings of the third annual workshop on Computational learning theory
A general lower bound on the number of examples needed for learning

COLT '88 Proceedings of the first annual workshop on Computational learning theory
Limitations on inductive learning

Proceedings of the sixth international workshop on Machine learning
Induction of Decision Trees

Machine Learning
Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm

Machine Learning

Non-parametric classifier-independent feature selection

Pattern Recognition
A hybrid approach for feature subset selection using neural networks and ant colony optimization

Expert Systems with Applications: An International Journal
A combined MRI and MRSI based multiclass system for brain tumour recognition using LS-SVMs with class probabilities and feature selection

Artificial Intelligence in Medicine
Incremental Bayesian classification for multivariate normal distribution data

Pattern Recognition Letters
A maximum entropy approach to feature selection in knowledge-based authentication

Decision Support Systems
Dimensionality reduction for knowledge discovery in medical claims database: Application to antidepressant medication utilization study

Computer Methods and Programs in Biomedicine
An SVM classifier incorporating simultaneous noise reduction and feature selection: illustrative case examples

Pattern Recognition
Optimizing reservoir features in oil exploration management based on fusion of soft computing

Applied Soft Computing
Correlation based feature selection method

International Journal of Bio-Inspired Computation
Distributed learning with data reduction

Transactions on computational collective intelligence IV
Bayesian classification and feature selection from finite data sets

UAI'00 Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence
STochFS: a framework for combining feature selection outcomes through a stochastic process

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
A method for feature selection on microarray data using support vector machine

DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
Sampling of virtual examples to improve classification accuracy for nominal attribute data

RSCTC'06 Proceedings of the 5th international conference on Rough Sets and Current Trends in Computing
Correlation-based and causal feature selection analysis for ensemble classifiers

ANNPR'10 Proceedings of the 4th IAPR TC3 conference on Artificial Neural Networks in Pattern Recognition
Feature selection method using preferences aggregation

MLDM'05 Proceedings of the 4th international conference on Machine Learning and Data Mining in Pattern Recognition
The application of adaptive partitioned random search in feature selection problem

ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
Selective gaussian naïve bayes model for diffuse large-b-cell lymphoma classification: some improvements in preprocessing and variable elimination

ECSQARU'05 Proceedings of the 8th European conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty
Relevancy in constraint-based subgroup discovery

Proceedings of the 2004 European conference on Constraint-Based Mining and Inductive Databases
Distinguishing Facial Features for Ethnicity-Based 3D Face Recognition

ACM Transactions on Intelligent Systems and Technology (TIST)
Multi criteria wrapper improvements to naive bayes learning

IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
Is feature selection still necessary?

SLSFS'05 Proceedings of the 2005 international conference on Subspace, Latent Structure and Feature Selection
A global-ranking local feature selection method for text categorization

Expert Systems with Applications: An International Journal
Pattern learning and active feature selection for word sense disambiguation

SENSEVAL '01 The Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation Systems
A genetic programming approach to hyper-heuristic feature selection

SEAL'12 Proceedings of the 9th international conference on Simulated Evolution and Learning
Reducing the size of databases for multirelational classification: a subgraph-based approach

Journal of Intelligent Information Systems
White box radial basis function classifiers with component selection for clinical prediction models

Artificial Intelligence in Medicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many domains, an appropriate inductive bias is the MIN-FEATURES bias, which prefers consistent hypotheses definable over as few features as possible. This paper defines and studies this bias. First, it is shown that any learning algorithm implementing the MIN-FEATURES bias requires Θ(1/ε ln 1/δ+ 1/ε[2p + p ln n]) training examples to guarantee PAC-learning a concept having p relevant features out of n available features. This bound is only logarithmic in the number of irrelevant features. The paper also presents a quasi-polynomial time algorithm, FOCUS, which implements MIN-FEATURES. Experimental studies are presented that compare FOCUS to the ID3 and FRINGE algorithms. These experiments show that-- contrary to expectations--these algorithms do not implement good approximations of MIN-FEATURES. The coverage, sample complexity, and generalization performance of FOCUS is substantially better than either ID3 or FRINGE on learning problems where the MIN-FEATURES bias is appropriate. This suggests that, in practical applications, training data should be preprocessed to remove irrelevant features before being given to ID3 or FRINGE.