An unsupervised approach to feature discretization and selection

Authors:
Artur J. Ferreira;MáRio A. T. Figueiredo
Affiliations:
Instituto Superior de Engenharia de Lisboa, Polytechnic Institute of Lisbon, Portugal and Instituto de Telecomunicaçíes, Lisboa, Portugal;Instituto Superior Técnico, Technical University of Lisbon, Portugal and Instituto de Telecomunicaçíes, Lisboa, Portugal
Venue:
Pattern Recognition
Year:
2012

Citing 28
Cited 4

Instance-Based Learning Algorithms

Machine Learning
Elements of information theory

Elements of information theory
A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
The nature of statistical learning theory

The nature of statistical learning theory
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
The Random Subspace Method for Constructing Decision Forests

IEEE Transactions on Pattern Analysis and Machine Intelligence
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Unsupervised Feature Selection Using Feature Similarity

IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms

Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Discretization: An Enabling Technique

Data Mining and Knowledge Discovery
An introduction to variable and feature selection

The Journal of Machine Learning Research
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Efficient Feature Selection via Analysis of Relevance and Redundancy

The Journal of Machine Learning Research
Not So Naive Bayes: Aggregating One-Dependence Estimators

Machine Learning
Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy

IEEE Transactions on Pattern Analysis and Machine Intelligence
A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis

Bioinformatics
Random subspace method for multivariate feature selection

Pattern Recognition Letters
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing)

Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing)
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
A discretization algorithm based on Class-Attribute Contingency Coefficient

Information Sciences: an International Journal
An integrated algorithm for gene selection and classification applied to microarray data of ovarian cancer

Artificial Intelligence in Medicine
A review of feature selection techniques in bioinformatics

Bioinformatics
Introduction to Information Retrieval

Introduction to Information Retrieval
The Feature Importance Ranking Measure

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Information Theory in Computer Vision and Pattern Recognition

Information Theory in Computer Vision and Pattern Recognition
A hybrid GA/SVM approach for gene selection and classification of microarray data

EuroGP'06 Proceedings of the 2006 international conference on Applications of Evolutionary Computing

Recognition of word collocation habits using frequency rank ratio and inter-term intimacy

Expert Systems with Applications: An International Journal
Feature selection techniques with class separability for multivariate time series

Neurocomputing
Feature subset selection Filter-Wrapper based on low quality data

Expert Systems with Applications: An International Journal
A feature construction method for general object recognition

Pattern Recognition

Quantified Score

Hi-index	0.01

Visualization

Abstract

Many learning problems require handling high dimensional datasets with a relatively small number of instances. Learning algorithms are thus confronted with the curse of dimensionality, and need to address it in order to be effective. Examples of these types of data include the bag-of-words representation in text classification problems and gene expression data for tumor detection/classification. Usually, among the high number of features characterizing the instances, many may be irrelevant (or even detrimental) for the learning tasks. It is thus clear that there is a need for adequate techniques for feature representation, reduction, and selection, to improve both the classification accuracy and the memory requirements. In this paper, we propose combined unsupervised feature discretization and feature selection techniques, suitable for medium and high-dimensional datasets. The experimental results on several standard datasets, with both sparse and dense features, show the efficiency of the proposed techniques as well as improvements over previous related techniques.