Feature selection in scientific applications

Authors:
Erick Cantú-Paz;Shawn Newsam;Chandrika Kamath
Affiliations:
Lawrence Livermore National Laboratory, Livermore, CA;Lawrence Livermore National Laboratory, Livermore, CA;Lawrence Livermore National Laboratory, Livermore, CA
Venue:
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2004

Citing 4
Cited 10

A Theory for Multiresolution Signal Decomposition: The Wavelet Representation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Texture Features for Browsing and Retrieval of Image Data

IEEE Transactions on Pattern Analysis and Machine Intelligence
Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Dimensionality Reduction in Automatic Knowledge Acquisition: A Simple Greedy Search Approach

IEEE Transactions on Knowledge and Data Engineering

Particle swarm optimization for pap-smear diagnosis

Expert Systems with Applications: An International Journal
Pap smear diagnosis using a hybrid intelligent scheme focusing on genetic algorithm based feature selection and nearest neighbor classification

Computers in Biology and Medicine
Ant colony and particle swarm optimization for financial classification problems

Expert Systems with Applications: An International Journal
Rank Aggregation Based Text Feature Selection

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Honey Bees Mating Optimization algorithm for financial classification problems

Applied Soft Computing
Predicting stock market trends using hybrid ant-colony-based data mining algorithms: an empirical validation on the Bombay Stock Exchange

International Journal of Business Intelligence and Data Mining
Data mining techniques for the screening of age-related macular degeneration

Knowledge-Based Systems
Discriminant phase component for face recognition

Journal of Electrical and Computer Engineering
Discrete Artificial Bee Colony Optimization Algorithm for Financial Classification Problems

International Journal of Applied Metaheuristic Computing
Rapid-transform based rotation invariant descriptor for texture classification under non-ideal conditions

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Numerous applications of data mining to scientific data involve the induction of a classification model. In many cases, the collection of data is not performed with this task in mind, and therefore, the data might contain irrelevant or redundant features that affect negatively the accuracy of the induction algorithms. The size and dimensionality of typical scientific data make it difficult to use any available domain information to identify features that discriminate between the classes of interest. Similarly, exploratory data analysis techniques have limitations on the amount and dimensionality of the data they can process effectively. In this paper, we describe applications of efficient feature selection methods to data sets from astronomy, plasma physics, and remote sensing. We use variations of recently proposed filter methods as well as traditional wrapper approaches, where practical. We discuss the general challenges of feature selection in scientific datasets, the strategies for success that were common among our diverse applications, and the lessons learned in solving these problems.