Bias of importance measures for multi-valued attributes and solutions

Authors:
Houtao Deng;George Runger;Eugene Tuv
Affiliations:
School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, AZ;School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, AZ;Intel Corporation, Chandler, AZ
Venue:
ICANN'11 Proceedings of the 21st international conference on Artificial neural networks - Volume Part II
Year:
2011

Citing 7
Cited 3

A practical approach to feature selection

ML92 Proceedings of the ninth international workshop on Machine learning
Technical Note: Bias in Information-Based Measures in Decision Tree Induction

Machine Learning
Random Forests

Machine Learning
Induction of Decision Trees

Machine Learning
On biases in estimating multi-valued attributes

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Feature Selection with Ensembles, Artificial Variables, and Redundancy Elimination

The Journal of Machine Learning Research
Permutation importance

Bioinformatics

Machine learning for improved pathological staging of prostate cancer: A performance comparison on a range of classifiers

Artificial Intelligence in Medicine
Developing a predictive model of quality of experience for internet video

Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM
An intelligent pattern recognition model to automate the categorisation of residential water end-use events

Environmental Modelling & Software

Quantified Score

Hi-index	0.00

Visualization

Abstract

Attribute importance measures for supervised learning are important for improving both learning accuracy and interpretability. However, it is well-known there could be bias when the predictor attributes have different numbers of values. We propose two methods to solve the bias problem. One uses an out-of-bag sampling method called OOBForest and one, based on the new concept of a partial permutation test, is called pForest. The existing research has considered the bias problem only among irrelevant attributes and equally informative attributes, while we compare to existing methods in a situation where unequally informative attributes (with or without interactions) and irrelevant attributes co-exist. We observe that the existing methods are not always reliable for multi-valued predictors, while the proposed methods compare favorably in our experiments.