Model mining for robust feature selection

Authors:
Adam Woznica;Phong Nguyen;Alexandros Kalousis
Affiliations:
University of Geneva, Carouge, Switzerland;University of Geneva, Carouge, Switzerland;University Of Applied Sciences, Carouge, Switzerland
Venue:
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2012

Citing 10
Cited 1

Generating non-redundant association rules

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Gene Selection for Cancer Classification using Support Vector Machines

Machine Learning
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases

Proceedings of the 17th International Conference on Data Engineering
Theoretical and Empirical Analysis of ReliefF and RReliefF

Machine Learning
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Stability of feature selection algorithms: a study on high-dimensional spaces

Knowledge and Information Systems
Stable feature selection via dense feature groups

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Robust Feature Selection Using Ensemble Feature Selection Techniques

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Consensus group stable feature selection

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Robust biomarker identification for cancer diagnosis with ensemble feature selection methods

Bioinformatics

FIU-Miner: a fast, integrated, and user-friendly system for data mining in distributed environment

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

A common problem with most of the feature selection methods is that they often produce feature sets--models--that are not stable with respect to slight variations in the training data. Different authors tried to improve the feature selection stability using ensemble methods which aggregate different feature sets into a single model. However, the existing ensemble feature selection methods suffer from two main shortcomings: (i) the aggregation treats the features independently and does not account for their interactions, and (ii) a single feature set is returned, nevertheless, in various applications there might be more than one feature sets, potentially redundant, with similar information content. In this work we address these two limitations. We present a general framework in which we mine over different feature models produced from a given dataset in order to extract patterns over the models. We use these patterns to derive more complex feature model aggregation strategies that account for feature interactions, and identify core and distinct feature models. We conduct an extensive experimental evaluation of the proposed framework where we demonstrate its effectiveness over a number of high-dimensional problems from the fields of biology and text-mining.