Feature Extraction Using Information-Theoretic Learning

Authors:
Kenneth E. Hild;Deniz Erdogmus;Kari Torkkola;Jose C. Principe
Affiliations:
IEEE;IEEE;-;IEEE
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
2006

Citing 8
Cited 31

Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Pattern Recognition and Neural Networks

Pattern Recognition and Neural Networks
Learning from Examples with Information Theoretic Criteria

Journal of VLSI Signal Processing Systems
Mutual Information in Learning Feature Transformations

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
An information-theoretic approach to sonar automatic target recognition

An information-theoretic approach to sonar automatic target recognition
Lower and Upper Bounds for Misclassification Probability Based on Renyi's Information

Journal of VLSI Signal Processing Systems
Generalized information potential criterion for adaptive system training

IEEE Transactions on Neural Networks

Approximate information discriminant analysis: A computationally simple heteroscedastic feature extraction technique

Pattern Recognition
Approximate information discriminant analysis: A computationally simple heteroscedastic feature extraction technique

Pattern Recognition
Learning vector quantization algorithm as classifier for Arabic handwritten characters recognition

ACOS'07 Proceedings of the 6th Conference on WSEAS International Conference on Applied Computer Science - Volume 6
Features extraction method for Arabic characters based on pixel orientation technique

CIMMACS'06 Proceedings of the 5th WSEAS International Conference on Computational Intelligence, Man-Machine Systems and Cybernetics
Comparison of Shannon, Renyi and Tsallis Entropy Used in Decision Trees

ICAISC '08 Proceedings of the 9th international conference on Artificial Intelligence and Soft Computing
Arabic Handwritten Characters Classification Using Learning Vector Quantization Algorithm

ICISP '08 Proceedings of the 3rd international conference on Image and Signal Processing
A minimax mutual information scheme for supervised feature extraction and its application to EEG-based brain-computer interfacing

EURASIP Journal on Advances in Signal Processing
Robust feature extraction via information theoretic learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Variational Graph Embedding for Globally and Locally Consistent Feature Extraction

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Info-margin maximization for feature extraction

Pattern Recognition Letters
Normalized mutual information feature selection

IEEE Transactions on Neural Networks
Robust Discriminant Analysis Based on Nonparametric Maximum Entropy

ACML '09 Proceedings of the 1st Asian Conference on Machine Learning: Advances in Machine Learning
Face recognition with info-margin maximization

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Information theoretic combination of classifiers with application to AdaBoost

MCS'07 Proceedings of the 7th international conference on Multiple classifier systems
Feature selection by nonparametric Bayes error minimization

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Principal component analysis based on non-parametric maximum entropy

Neurocomputing
Information theoretic combination of pattern classifiers

Pattern Recognition
A conditional entropy minimization criterion for dimensionality reduction and multiple kernel learning

Neural Computation
A compact local binary pattern using maximization of mutual information for face analysis

Pattern Recognition
A linear discriminant analysis method based on mutual information maximization

Pattern Recognition
A classifier for Arabic handwritten characters based on supervised self-organizing map neural network

MMES'10 Proceedings of the 2010 international conference on Mathematical models for engineering science
Sparse kernel density estimations and its application in variable selection based on quadratic Renyi entropy

Neurocomputing
Feature selection using hierarchical feature clustering

Proceedings of the 20th ACM international conference on Information and knowledge management
Using cooperative game theory to optimize the feature selection problem

Neurocomputing
Multilayer perceptrons as classifiers guided by mutual information and trained with genetic algorithms

IDEAL'12 Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning
Feature selection using dynamic weights for classification

Knowledge-Based Systems
A comprehensive survey of the feature extraction methods in the EEG research

ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
A high-performance, low-energy FPGA accelerator for correntropy-based feature tracking (abstract only)

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Beyond Fano's inequality: bounds on the optimal F-score, BER, and cost-sensitive risk and their implications

The Journal of Machine Learning Research
Regularized discriminant entropy analysis

Pattern Recognition
Optimized dissimilarity space embedding for labeled graphs

Information Sciences: an International Journal

Quantified Score

Hi-index	0.14

Visualization

Abstract

A classification system typically consists of both a feature extractor (preprocessor) and a classifier. These two components can be trained either independently or simultaneously. The former option has an implementation advantage since the extractor need only be trained once for use with any classifier, whereas the latter has an advantage since it can be used to minimize classification error directly. Certain criteria, such as Minimum Classification Error, are better suited for simultaneous training, whereas other criteria, such as Mutual Information, are amenable for training the feature extractor either independently or simultaneously. Herein, an information-theoretic criterion is introduced and is evaluated for training the extractor independently of the classifier. The proposed method uses nonparametric estimation of Renyi's entropy to train the extractor by maximizing an approximation of the mutual information between the class labels and the output of the feature extractor. The evaluations show that the proposed method, even though it uses independent training, performs at least as well as three feature extraction methods that train the extractor and classifier simultaneously.