Feature extraction by non parametric mutual information maximization

Authors:
Kari Torkkola
Affiliations:
Motorola Labs, 7700 South River Parkway, MD ML28, Tempe AZ
Venue:
The Journal of Machine Learning Research
Year:
2003

Citing 13
Cited 109

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
Nonlinear component analysis as a kernel eigenvalue problem

Neural Computation
Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition

Speech Communication
Lower Bounds for Bayes Error Estimation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Mutual Information in Learning Feature Transformations

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Experiments with Random Projection

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
SVMTorch: support vector machines for large-scale regression problems

The Journal of Machine Learning Research
Bhattacharyya Distance Feature Selection

ICPR '96 Proceedings of the 13th International Conference on Pattern Recognition - Volume 2
Linear Feature Extractors Based on Mutual Information

ICPR '96 Proceedings of the 13th International Conference on Pattern Recognition - Volume 2
Nonparametric discriminant analysis via recursive optimization ofPatrick-Fisher distance

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Some inequalities for information divergence and related measures of discrimination

IEEE Transactions on Information Theory
A common neural-network model for unsupervised exploratory data analysis and independent component analysis

IEEE Transactions on Neural Networks

An introduction to variable and feature selection

The Journal of Machine Learning Research
Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces

The Journal of Machine Learning Research
Fast Binary Feature Selection with Conditional Mutual Information

The Journal of Machine Learning Research
Improved learning of Riemannian metrics for exploratory analysis

Neural Networks - 2004 Special issue: New developments in self-organizing systems
Discriminant ECOC: A Heuristic Method for Application Dependent Design of Error Correcting Output Codes

IEEE Transactions on Pattern Analysis and Machine Intelligence
Information-preserving hybrid data reduction based on fuzzy-rough techniques

Pattern Recognition Letters
Kernel information embeddings

ICML '06 Proceedings of the 23rd international conference on Machine learning
Comparison of relevance learning vector quantization with other metric adaptive classification methods

Neural Networks
Spectral feature projections that maximize Shannon mutual information with class labels

Pattern Recognition
Word segmentation of handwritten text using supervised classification techniques

Applied Soft Computing
Separation of statistically dependent sources using an L2-distance non-Gaussianity measure

Signal Processing - Special section: Distributed source coding
Feature selection for the SVM: An application to hypertension diagnosis

Expert Systems with Applications: An International Journal
Information Discriminant Analysis: Feature Extraction with an Information-Theoretic Objective

IEEE Transactions on Pattern Analysis and Machine Intelligence
A novel image thresholding method based on Parzen window estimate

Pattern Recognition
Maximally Informative Feature and Sensor Selection in Pattern Recognition Using Local and Global Independent Component Analysis

Journal of VLSI Signal Processing Systems
Channel selection and feature projection for cognitive load estimation using ambulatory EEG

Computational Intelligence and Neuroscience - EEG/MEG Signal Processing
On discovery and learning of models with predictive representations of state for agents with continuous actions and observations

Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Approximate information discriminant analysis: A computationally simple heteroscedastic feature extraction technique

Pattern Recognition
Approximate information discriminant analysis: A computationally simple heteroscedastic feature extraction technique

Pattern Recognition
Pairwise vs global multi-class wrapper feature selection

AIKED'07 Proceedings of the 6th Conference on 6th WSEAS Int. Conf. on Artificial Intelligence, Knowledge Engineering and Data Bases - Volume 6
A parameterless feature ranking algorithm based on MI

Neurocomputing
Fuzzy classification using information theoretic learning vector quantization

Neurocomputing
Example based learning for object detection in images

VNBA '08 Proceedings of the 1st ACM workshop on Vision networks for behavior analysis
A minimax mutual information scheme for supervised feature extraction and its application to EEG-based brain-computer interfacing

EURASIP Journal on Advances in Signal Processing
Feature Selection Based on the Rough Set Theory and Expectation-Maximization Clustering Algorithm

RSCTC '08 Proceedings of the 6th International Conference on Rough Sets and Current Trends in Computing
Feature Discovery by Enhancement and Relaxation of Competitive Units

IDEAL '08 Proceedings of the 9th International Conference on Intelligent Data Engineering and Automated Learning
An efficient discriminant-based solution for small sample size problem

Pattern Recognition
An information-theoretic approach to feature extraction in competitive learning

Neurocomputing
Robust feature extraction via information theoretic learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Optimizing feature complementarity by evolution strategy: Application to automatic speaker verification

Speech Communication
Enhancing and Relaxing Competitive Units for Feature Discovery

Neural Processing Letters
Variational Graph Embedding for Globally and Locally Consistent Feature Extraction

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Universal Estimation of Information Measures for Analog Sources

Foundations and Trends in Communications and Information Theory
Explanation-based feature construction

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Info-margin maximization for feature extraction

Pattern Recognition Letters
High-entropy layouts for content-based browsing and retrieval

Proceedings of the ACM International Conference on Image and Video Retrieval
Gait feature subset selection by mutual information

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans - Special section: Best papers from the 2007 biometrics: Theory, applications, and systems (BTAS 07) conference
Adaptive Feature Transformation for Image Data from Non-stationary Processes

ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part II
Robust Discriminant Analysis Based on Nonparametric Maximum Entropy

ACML '09 Proceedings of the 1st Asian Conference on Machine Learning: Advances in Machine Learning
Face recognition with info-margin maximization

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Self-enhancement learning: self-supervised and target-creating learning

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Selective enhancement learning in competitive learning

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
An Information Theory framework for two-stage binary image operator design

Pattern Recognition Letters
Input selection in learning systems: a brief review of some important issues and recent developments

FUZZ-IEEE'09 Proceedings of the 18th international conference on Fuzzy Systems
Structural enhanced information to detect features in competitive learning

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Self-supervised learning by information enhancement: target-generating and spontaneous learning for competitive learning

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Selecting discrete and continuous features based on neighborhood decision error minimization

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
A novel feature selection approach by hybrid genetic algorithm

PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence
Robust independent component analysis using quadratic negentropy

ICA'07 Proceedings of the 7th international conference on Independent component analysis and signal separation
Multi filter bank approach for speaker verification based on genetic algorithm

NOLISP'07 Proceedings of the 2007 international conference on Advances in nonlinear speech processing
Sparse kernel-based feature weighting

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Principal component analysis based on non-parametric maximum entropy

Neurocomputing
Machine learning techniques for selforganizing combustion control

KI'09 Proceedings of the 32nd annual German conference on Advances in artificial intelligence
Information-theoretic enhancement learning and its application to visualization of self-organizing maps

Neurocomputing
Energy Supervised Relevance Neural Gas for Feature Ranking

Neural Processing Letters
A conditional entropy minimization criterion for dimensionality reduction and multiple kernel learning

Neural Computation
Linear projection method based on information theoretic learning

ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part III
Generation of comprehensible representations by supposed maximum information

ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part II
A compact local binary pattern using maximization of mutual information for face analysis

Pattern Recognition
A linear discriminant analysis method based on mutual information maximization

Pattern Recognition
Structural enhanced information and its application to improved visualization of self-organizing maps

Applied Intelligence
Information-theoretic competitive and cooperative learning for self-organizing maps

ICONIP'10 Proceedings of the 17th international conference on Neural information processing: models and applications - Volume Part II
Supposed maximum information for comprehensible representations in SOM

Neurocomputing
Selective information enhancement learning for creating interpretable representations in competitive learning

Neural Networks
Supervised principal component analysis: Visualization, classification and regression on subspaces and submanifolds

Pattern Recognition
Guided Locally Linear Embedding

Pattern Recognition Letters
Divergence-based classification in learning vector quantization

Neurocomputing
Data analysis pipeline from laboratory to MP models

Natural Computing: an international journal
Divergence-based vector quantization

Neural Computation
A fuzzy-based data transformation for feature extraction to increase classification performance with small medical data sets

Artificial Intelligence in Medicine
Visualizing multidimensional data through multilayer perceptron maps

ICANNGA'11 Proceedings of the 10th international conference on Adaptive and natural computing algorithms - Volume Part I
A regularized correntropy framework for robust pattern recognition

Neural Computation
Electrostatic field framework for supervised and semi-supervised learning from incomplete data

Natural Computing: an international journal
Information-theoretic approaches to SVM feature selection for metagenome read classification

Computational Biology and Chemistry
Weighted mutual information for feature selection

ICANN'11 Proceedings of the 21st international conference on Artificial neural networks - Volume Part II
Multimodal nonlinear filtering using Gauss-Hermite quadrature

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Prototype based classification using information theoretic learning

ICONIP'06 Proceedings of the 13th international conference on Neural Information Processing - Volume Part II
Conditional infomax learning: an integrated framework for feature extraction and fusion

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I
Online feature selection using mutual information for real-time multi-view object tracking

AMFG'05 Proceedings of the Second international conference on Analysis and Modelling of Faces and Gestures
Dimensionality reduction based on non-parametric mutual information

Neurocomputing
Feature subset selection with cumulate conditional mutual information minimization

Expert Systems with Applications: An International Journal
A comparison of linear ICA and local linear ICA for mutual information based feature ranking

ICA'06 Proceedings of the 6th international conference on Independent Component Analysis and Blind Signal Separation
Mutual information measures for subclass error-correcting output codes classification

SETN'10 Proceedings of the 6th Hellenic conference on Artificial Intelligence: theories, models and applications
Self-organizing by information maximization: realizing self-organizing maps by information-theoretic competitive learning

ICONIP'06 Proceedings of the 13 international conference on Neural Information Processing - Volume Part I
Algebraic geometric comparison of probability distributions

The Journal of Machine Learning Research
Mutual Information Optimization for Mass Spectra Data Alignment

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Double enhancement learning for explicit internal representations: unifying self-enhancement and information enhancement to incorporate information on input variables

Applied Intelligence
Algorithms for maximum-likelihood bandwidth selection in kernel density estimators

Pattern Recognition Letters
Automatic categorisation of comments in social news websites

Expert Systems with Applications: An International Journal
Exploiting quadratic mutual information for discriminant analysis

SETN'12 Proceedings of the 7th Hellenic conference on Artificial Intelligence: theories and applications
Sensor selection to support practical use of health-monitoring smart environments

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
A minimax probabilistic approach to feature transformation for multi-class data

Applied Soft Computing
A robot learning from demonstration framework to perform force-based manipulation tasks

Intelligent Service Robotics
Opcode sequences as representation of executables for data-mining-based unknown malware detection

Information Sciences: an International Journal
Sufficient dimension reduction via squared-loss mutual information estimation

Neural Computation
Feature selection for high-dimensional imbalanced data

Neurocomputing
The fuzzy Laplacianclassifier

Neurocomputing
Cascading outbreak prediction in networks: a data-driven approach

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Beyond Fano's inequality: bounds on the optimal F-score, BER, and cost-sensitive risk and their implications

The Journal of Machine Learning Research
Efficient penetration depth approximation using active learning

ACM Transactions on Graphics (TOG)
Density-difference estimation

Neural Computation
Improving the performance of P300-based brain-computer interface through subspace-based filtering

Neurocomputing
Regularized discriminant entropy analysis

Pattern Recognition
Time-efficient estimation of conditional mutual information for variable selection in classification

Computational Statistics & Data Analysis
A survey on feature selection methods

Computers and Electrical Engineering
Mixed feature selection in incomplete decision table

Knowledge-Based Systems
Optimized dissimilarity space embedding for labeled graphs

Information Sciences: an International Journal
A scatter method for data and variable importance evaluation

Integrated Computer-Aided Engineering
Mutual information evaluation: A way to predict the performance of feature weighting on clustering

Intelligent Data Analysis

Quantified Score

Hi-index	0.01

Visualization

Abstract

We present a method for learning discriminative feature transforms using as criterion the mutual information between class labels and transformed features. Instead of a commonly used mutual information measure based on Kullback-Leibler divergence, we use a quadratic divergence measure, which allows us to make an efficient non-parametric implementation and requires no prior assumptions about class densities. In addition to linear transforms, we also discuss nonlinear transforms that are implemented as radial basis function networks. Extensions to reduce the computational complexity are also presented, and a comparison to greedy feature selection is made.