Benchmarking Attribute Selection Techniques for Discrete Class Data Mining

Authors:
Mark A. Hall;Geoffrey Holmes
Affiliations:
-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2003

Citing 9
Cited 117

C4.5: programs for machine learning

C4.5: programs for machine learning
Estimating attributes: analysis and extensions of RELIEF

ECML-94 Proceedings of the European conference on machine learning on Machine Learning
Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
A Practical Approach to Feature Selection

ML '92 Proceedings of the Ninth International Workshop on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
An adaptation of Relief for attribute estimation in regression

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning

Hybridized rough set framework for classification: an experimental view

Design and application of hybrid intelligent systems
A Comprehensive and Automated Approach to Intelligent Business Processes Execution Analysis

Distributed and Parallel Databases
Application of Probabilistic Neural Networks to the Class Prediction of Leukemia and Embryonal Tumor of Central Nervous System

Neural Processing Letters
A pitfall and solution in multi-class feature selection for text classification

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Classification and knowledge discovery in protein databases

Journal of Biomedical Informatics - Special issue: Biomedical machine learning
Feature influence for evolutionary learning

GECCO '05 Proceedings of the 7th annual conference on Genetic and evolutionary computation
A New Dependency and Correlation Analysis for Features

IEEE Transactions on Knowledge and Data Engineering
Feature subset selection can improve software cost estimation accuracy

PROMISE '05 Proceedings of the 2005 workshop on Predictor models in software engineering
Identifying Simple Discriminatory Gene Vectors with an Information Theory Approach

CSB '05 Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference
Finding the Right Data for Software Cost Modeling

IEEE Software
Specialization and extrapolation of software cost models

Proceedings of the 20th IEEE/ACM international Conference on Automated software engineering
Exploiting partial decision trees for feature subset selection in e-mail categorization

Proceedings of the 2006 ACM symposium on Applied computing
A bioinformatics framework for genotype-phenotype correlation in humans with Marfan syndrome caused by FBN1 gene mutations

Journal of Biomedical Informatics
A machine learning perspective on the development of clinical decision support systems utilizing mass spectra of blood samples

Journal of Biomedical Informatics
Incremental wrapper-based gene selection from microarray data for cancer classification

Pattern Recognition
Classification of gene-expression data: The manifold-based metric learning way

Pattern Recognition
An intelligent learning diagnosis system for Web-based thematic learning platform

Computers & Education
Topological approaches to covering rough sets

Information Sciences: an International Journal
Refining decision tree classifiers using rough set tools

International Journal of Hybrid Intelligent Systems - Hybrid Intelligence using rough sets
Data Mining Static Code Attributes to Learn Defect Predictors

IEEE Transactions on Software Engineering
Selecting Best Practices for Effort Estimation

IEEE Transactions on Software Engineering
Stability of feature selection algorithms: a study on high-dimensional spaces

Knowledge and Information Systems
Column Pruning Beats Stratification in Effort Estimation

PROMISE '07 Proceedings of the Third International Workshop on Predictor Models in Software Engineering
Feature selection and classification model construction on type 2 diabetic patients' data

Artificial Intelligence in Medicine
The business case for automated software engineering

Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering
Processing forecasting queries

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
The fitness-rough: A new attribute reduction method based on statistical and rough set theory

Intelligent Data Analysis
A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction

Proceedings of the 30th international conference on Software engineering
Classification of Ligase Function Based on Multi-parametric Feature Extracted from Protein Sequence

ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
Situation Assessment for Plan Retrieval in Real-Time Strategy Games

ECCBR '08 Proceedings of the 9th European conference on Advances in Case-Based Reasoning
Split Criterions for Variable Selection Using Decision Trees

ECSQARU '07 Proceedings of the 9th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty
Integrating in-process software defect prediction with association mining to discover defect pattern

Information and Software Technology
A feature selection technique for generation of classification committees and its application to categorization of laryngeal images

Pattern Recognition
Classification models for the prediction of clinicians' information needs

Journal of Biomedical Informatics
Feature selection with dynamic mutual information

Pattern Recognition
Countering imbalanced datasets to improve adverse drug event predictive models in labor and delivery

Journal of Biomedical Informatics
Validation of network measures as indicators of defective modules in software systems

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
On the value of combining feature subset selection with genetic algorithms: faster learning of coverage models

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
Demoting redundant features to improve the discriminatory ability in cancer data

Journal of Biomedical Informatics
Automatic online news monitoring and classification for syndromic surveillance

Decision Support Systems
A decision rule-based method for feature selection in predictive data mining

Expert Systems with Applications: An International Journal
An optimization of ReliefF for classification in large datasets

Data & Knowledge Engineering
GUEST EDITORIAL: Computational intelligence in solving bioinformatics problems

Artificial Intelligence in Medicine
Efficient feature weighting methods for ranking

Proceedings of the 18th ACM conference on Information and knowledge management
Finding robust solutions in requirements models

Automated Software Engineering
An empirical investigation of filter attribute selection techniques for software quality classification

IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
Identifying Fewer Key Factors by Attribute Selection Methodologies to Understand the Hospital Admission Prediction Pattern with Ant Miner and C4.5

KES '09 Proceedings of the 13th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems: Part II
An Omnibus Permutation Test on Ensembles of Two-Locus Analyses for the Detection of Purely Epistatic Multi-locus Interactions

ICONIP '09 Proceedings of the 16th International Conference on Neural Information Processing: Part II
Improved variable and value ranking techniques for mining categorical traffic accident data

Expert Systems with Applications: An International Journal
Reducing the number of DNA primers for classifying pejibaye palm races using SVM

E-ACTIVITIES'09/ISP'09 Proceedings of the 8th WSEAS International Conference on E-Activities and information security and privacy
Tumor classification by combining PNN classifier ensemble with neighborhood rough set based gene reduction

Computers in Biology and Medicine
Using biclustering for automatic attribute selection to enhance global visualization

VIEW'06 Proceedings of the 1st first visual information expert conference on Pixelization paradigm
Efficient feature selection in the presence of outliers and noises

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
A novel metric for redundant gene elimination based on discriminative contribution

ISBRA'08 Proceedings of the 4th international conference on Bioinformatics research and applications
Selecting features from multiple feature sets for SVM committee-based screening of human larynx

Expert Systems with Applications: An International Journal
A novel hybrid feature selection via Symmetrical Uncertainty ranking based local memetic search algorithm

Knowledge-Based Systems
Empirical evaluation of feature selection methods in classification

Intelligent Data Analysis
Stable rankings for different effort models

Automated Software Engineering
Feature selection of RAPD haplotypes for identifying peach palm (Bactris gasipaes) landraces using SVM

WSEAS Transactions on Computers
Feature selection for brain-computer interfaces

PAKDD'09 Proceedings of the 13th Pacific-Asia international conference on Knowledge discovery and data mining: new frontiers in applied data mining
Feature set reduction by evolutionary selection and construction

KES-AMSTA'10 Proceedings of the 4th KES international conference on Agent and multi-agent systems: technologies and applications, Part II
Adaptive particle swarm optimizer for feature selection

IDEAL'10 Proceedings of the 11th international conference on Intelligent data engineering and automated learning
Selecting small audio feature sets in music classification by means of asymmetric mutation

PPSN'10 Proceedings of the 11th international conference on Parallel problem solving from nature: Part I
Covering based approximation – a new type approach

International Journal of Computational Vision and Robotics
Computerized methods for the assessment and characterization of the inflammatory bowel diseases and colon cancer from ultrasound and endoscopic images

NEHIPISIC'11 Proceeding of 10th WSEAS international conference on electronics, hardware, wireless and optical communications, and 10th WSEAS international conference on signal processing, robotics and automation, and 3rd WSEAS international conference on nanotechnology, and 2nd WSEAS international conference on Plasma-fusion-nuclear physics
On the discriminability of keystroke feature vectors used in fixed text keystroke authentication

Pattern Recognition Letters
A hybrid feature selection method for DNA microarray data

Computers in Biology and Medicine
A new combined filter-wrapper framework for gene subset selection with specialized genetic operators

MCPR'10 Proceedings of the 2nd Mexican conference on Pattern recognition: Advances in pattern recognition
Hybrid feature selection method for supervised classification based on Laplacian score ranking

MCPR'10 Proceedings of the 2nd Mexican conference on Pattern recognition: Advances in pattern recognition
Using intelligence techniques to predict postoperative morbidity of endovascular aneurysm repair

ACIIDS'11 Proceedings of the Third international conference on Intelligent information and database systems - Volume Part I
An industrial case study of classifier ensembles for locating software defects

Software Quality Control
Compact features for sentiment analysis

Canadian AI'11 Proceedings of the 24th Canadian conference on Advances in artificial intelligence
Fast wrapper feature subset selection in high-dimensional datasets by means of filter re-ranking

Knowledge-Based Systems
Emotion based music visualization system

ISMIS'11 Proceedings of the 19th international conference on Foundations of intelligent systems
Automating image segmentation verification and validation by learning test oracles

Information and Software Technology
Classifier ensemble construction with rotation forest to improve medical diagnosis performance of machine learning algorithms

Computer Methods and Programs in Biomedicine
A soft-computing based rough sets classifier for classifying IPO returns in the financial markets

Applied Soft Computing
The inductive software engineering manifesto: principles for industrial data mining

Proceedings of the International Workshop on Machine Learning Technologies in Software Engineering
Binary relation based rough sets

FSKD'06 Proceedings of the Third international conference on Fuzzy Systems and Knowledge Discovery
EGEA: a new hybrid approach towards extracting reduced generic association rule set (application to AML blood cancer therapy)

DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
Predicting high-risk program modules by selecting the right software measurements

Software Quality Control
Special issue on repeatable results in software engineering prediction

Empirical Software Engineering
Attribute selection and rule generation techniques for medical diagnosis systems

RSFDGrC'05 Proceedings of the 10th international conference on Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing - Volume Part II
Learning opinions in user-generated web content

Natural Language Engineering
Analysis of feature rankings for classification

IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis
Feature selection and classification model construction on type 2 diabetic patient’s data

ICDM'04 Proceedings of the 4th international conference on Advances in Data Mining: applications in Image Mining, Medicine and Biotechnology, Management and Environmental Control, and Telecommunications
The fourth type of covering-based rough sets

Information Sciences: an International Journal
An application of rough sets to graph theory

Information Sciences: an International Journal
Software measurement data reduction using ensemble techniques

Neurocomputing
Analyzing Online Review Helpfulness Using a Regressional ReliefF-Enhanced Text Mining Method

ACM Transactions on Management Information Systems (TMIS)
I-prune: Item selection for associative classification

International Journal of Intelligent Systems
Intelligent Postoperative Morbidity Prediction of Heart Disease Using Artificial Intelligence Techniques

Journal of Medical Systems
Evaluating defect prediction approaches: a benchmark and an extensive comparison

Empirical Software Engineering
Measuring stability of feature ranking techniques: a noise-based approach

International Journal of Business Intelligence and Data Mining
Evaluation of the importance of data pre-processing order when combining feature selection and data sampling

International Journal of Business Intelligence and Data Mining
StressSense: detecting stress in unconstrained acoustic environments using smartphones

Proceedings of the 2012 ACM Conference on Ubiquitous Computing
Extracting performance rules of suppliers in the manufacturing industry: an empirical study

Journal of Intelligent Manufacturing
Matroidal structure of rough sets and its characterization to attribute reduction

Knowledge-Based Systems
Combining DTI and MRI for the automated detection of alzheimer’s disease using a large european multicenter dataset

MBIA'12 Proceedings of the Second international conference on Multimodal Brain Image Analysis
Hybrid approach for diagnosing thyroid, hepatitis, and breast cancer based on correlation based feature selection and Naïve bayes

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part IV
Rough matroids based on relations

Information Sciences: an International Journal
Predicting aging-related bugs using software complexity metrics

Performance Evaluation
Influence of confirmation biases of developers on software quality: an empirical study

Software Quality Control
Results on mining NHANES data: A case study in evidence-based medicine

Computers in Biology and Medicine
Exploitation of 3D stereotactic surface projection for predictive modelling of Alzheimer's disease

International Journal of Data Mining and Bioinformatics
Effectiveness of state-of-the-art features for microblog search

Proceedings of the 28th Annual ACM Symposium on Applied Computing
Software effort models should be assessed via leave-one-out validation

Journal of Systems and Software
A learning-based method for combining testing techniques

Proceedings of the 2013 International Conference on Software Engineering
Automatic detection of performance deviations in the load testing of large scale systems

Proceedings of the 2013 International Conference on Software Engineering
An algorithmic approach to missing data problem in modeling human aspects in software development

Proceedings of the 9th International Conference on Predictive Models in Software Engineering
A study of subgroup discovery approaches for defect prediction

Information and Software Technology
How can i help you': comparing engagement classification strategies for a robot bartender

Proceedings of the 15th ACM on International conference on multimodal interaction
Comparative assessment of feature selection and classification techniques for visual inspection of pot plant seedlings

Computers and Electronics in Agriculture
An investigation into the application of ensemble learning for entailment classification

Information Processing and Management: an International Journal
Finding conclusion stability for selecting the best effort predictor in software effort estimation

Automated Software Engineering
An approach to dimensionality reduction in time series

Information Sciences: an International Journal
Updating attribute reduction in incomplete decision systems with the variation of attribute set

International Journal of Approximate Reasoning

Quantified Score

Hi-index	0.01

Visualization

Abstract

Data engineering is generally considered to be a central issue in the development of data mining applications. The success of many learning schemes, in their attempts to construct models of data, hinges on the reliable identification of a small set of highly predictive attributes. The inclusion of irrelevant, redundant, and noisy attributes in the model building process phase can result in poor predictive performance and increased computation. Attribute selection generally involves a combination of search and attribute utility estimation plus evaluation with respect to specific learning schemes. This leads to a large number of possible permutations and has led to a situation where very few benchmark studies have been conducted. This paper presents a benchmark comparison of several attribute selection methods for supervised classification. All the methods produce an attribute ranking, a useful devise for isolating the individual merit of an attribute. Attribute selection is achieved by cross-validating the attribute rankings with respect to a classification learner to find the best attributes. Results are reported for a selection of standard data sets and two diverse learning schemes C4.5 and naive Bayes.