High-dimensional micro-array data classification using minimum description length and domain expert knowledge

Authors:
Andrea Bosin;Nicoletta Dessì;Barbara Pes
Affiliations:
Dipartimento di Matematica e Informatica, Università degli Studi di Cagliari, Cagliari;Dipartimento di Matematica e Informatica, Università degli Studi di Cagliari, Cagliari;Dipartimento di Matematica e Informatica, Università degli Studi di Cagliari, Cagliari
Venue:
IEA/AIE'06 Proceedings of the 19th international conference on Advances in Applied Artificial Intelligence: industrial, Engineering and Other Applications of Applied Intelligent Systems
Year:
2006

Citing 8
Cited 2

Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Bayesian Network Classifiers

Machine Learning - Special issue on learning with probabilistic representations
Gene Selection for Cancer Classification using Support Vector Machines

Machine Learning
On biases in estimating multi-valued attributes

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Comparing Bayesian network classifiers

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Gene expression data analysis of human lymphoma using support vector machines and output coding ensembles

Artificial Intelligence in Medicine
The minimum description length principle in coding and modeling

IEEE Transactions on Information Theory

A Cost-Sensitive Approach to Feature Selection in Micro-Array Data Classification

WILF '07 Proceedings of the 7th international workshop on Fuzzy Logic and Applications: Applications of Fuzzy Sets Theory
Cooperative E-Organizations for Distributed Bioinformatics Experiments

IDEAL '08 Proceedings of the 9th International Conference on Intelligent Data Engineering and Automated Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper reports on three machine learning methods, i.e. Naïve Bayes (NB), Adaptive Bayesian Network (ABN) and Support Vector Machines (SVM) for multi-target classification on micro-array datasets involving a large feature space and very few samples. By adopting the Minimum Description Length criterion for ranking and selecting relevant features, experiments are carried out to investigate the accuracy and effectiveness of the above methods in classifying many targets as well as to study the effects of feature selection on the sensitivity of each classifier. The paper also shows how the knowledge of a domain expert makes it possible to decompose the multi-target classification in a set of binary classifications, one for each target, with a substantial improvement in accuracy. The effectiveness of the MDL criterion to decide on particular feature subsets is asserted by empirical results showing that MDL is comparable with entropy based feature selection methodologies reported by earlier works.