A method for combining mutual information and canonical correlation analysis: Predictive Mutual Information and its use in feature selection

Authors:
C. Okan Sakar;Olcay Kursun
Affiliations:
Department of Computer Engineering, Bahcesehir University, Istanbul, Turkey;Department of Computer Engineering, Istanbul University, Istanbul, Turkey
Venue:
Expert Systems with Applications: An International Journal
Year:
2012

Citing 13
Cited 2

The nature of statistical learning theory

The nature of statistical learning theory
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Input Feature Selection by Mutual Information Based on Parzen Window

IEEE Transactions on Pattern Analysis and Machine Intelligence
SINBAD automation of scientific discovery: From factor analysis to theory synthesis

Natural Computing: an international journal
SINBAD: A neocortical mechanism for discovering environmental variables and regularities hidden in sensory input

Biological Cybernetics
Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy

IEEE Transactions on Pattern Analysis and Machine Intelligence
Training Cost-Sensitive Neural Networks with Methods Addressing the Class Imbalance Problem

IEEE Transactions on Knowledge and Data Engineering
Canonical Correlation Analysis: An Overview with Application to Learning Methods

Neural Computation
The class imbalance problem: A systematic study

Intelligent Data Analysis
A detailed analysis of the KDD CUP 99 data set

CISDA'09 Proceedings of the Second IEEE international conference on Computational intelligence for security and defense applications
A Hybrid Method for Feature Selection Based on Mutual Information and Canonical Correlation Analysis

ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition
Bayesian bin distribution inference and mutual information

IEEE Transactions on Information Theory
A comparison of methods for multiclass support vector machines

IEEE Transactions on Neural Networks

Ensemble canonical correlation analysis

Applied Intelligence
Mutual information evaluation: A way to predict the performance of feature weighting on clustering

Intelligent Data Analysis

Quantified Score

Hi-index	12.05

Visualization

Abstract

Feature selection is a critical step in many artificial intelligence and pattern recognition problems. Shannon's Mutual Information (MI) is a classical and widely used measure of dependence measure that serves as a good feature selection algorithm. However, as it is a measure of mutual information in average, under-sampled classes (rare events) can be overlooked by this measure, which can cause critical false negatives (missing a relevant feature very predictive of some rare but important classes). Shannon's mutual information requires a well sampled database, which is not typical of many fields of modern science (such as biomedical), in which there are limited number of samples to learn from, or at least, not all the classes of the target function (such as certain phenotypes in biomedical) are well-sampled. On the other hand, Kernel Canonical Correlation Analysis (KCCA) is a nonlinear correlation measure effectively used to detect independence but its use for feature selection or ranking is limited due to the fact that its formulation is not intended to measure the amount of information (entropy) of the dependence. In this paper, we propose a hybrid measure of relevance, Predictive Mutual Information (PMI) based on MI, which also accounts for predictability of signals from each other in its calculation as in KCCA. We show that PMI has more improved feature detection capability than MI, especially in catching suspicious coincidences that are rare but potentially important not only for experimental studies but also for building computational models. We demonstrate the usefulness of PMI, and superiority over MI, on both toy and real datasets.