Unsupervised joint feature discretization and selection

Authors:
Artur Ferreira;Mário Figueiredo
Affiliations:
Instituto Superior de Engenharia de Lisboa, and Instituto de Telecomunicações, Lisboa, Portugal;Instituto Superior Técnico, and Instituto de Telecomunicações, Lisboa, Portugal
Venue:
IbPRIA'11 Proceedings of the 5th Iberian conference on Pattern recognition and image analysis
Year:
2011

Citing 8
Cited 0

Elements of information theory

Elements of information theory
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms

Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
An introduction to variable and feature selection

The Journal of Machine Learning Research
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy

IEEE Transactions on Pattern Analysis and Machine Intelligence
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing)

Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing)
Introduction to Information Retrieval

Introduction to Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many applications, we deal with high dimensional datasets with different types of data. For instance, in text classification and information retrieval problems, we have large collections of documents. Each text is usually represented by a bag-of-words or similar representation, with a large number of features (terms). Many of these features may be irrelevant (or even detrimental) for the learning tasks. This excessive number of features carries the problem of memory usage in order to represent and deal with these collections, clearly showing the need for adequate techniques for feature representation, reduction, and selection, to both improve the classification accuracy and the memory requirements. In this paper, we propose a combined unsupervised feature discretization and feature selection technique. The experimental results on standard datasets show the efficiency of the proposed techniques as well as improvement over previous similar techniques.