Semisupervised Learning for Molecular Profiling

Authors:
Cesare Furlanello;Maria Serafini;Stefano Merler;Giuseppe Jurman
Affiliations:
-;-;-;-
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2005

Citing 8
Cited 4

The nature of statistical learning theory

The nature of statistical learning theory
Bagging predictors

Machine Learning
Discovering informative patterns and data cleaning

Advances in knowledge discovery and data mining
Scaling up dynamic time warping for datamining applications

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Soft Margins for AdaBoost

Machine Learning
Gene Selection for Cancer Classification using Support Vector Machines

Machine Learning
An accelerated procedure for recursive feature ranking on microarray data

Neural Networks - 2003 Special issue: Advances in neural networks research — IJCNN'03
Variable selection using svm based criteria

The Journal of Machine Learning Research

Integrating gene expression profiling and clinical data

International Journal of Approximate Reasoning
Robust Feature Selection for Microarray Data Based on Multicriterion Fusion

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Semisupervised profiling of gene expressions and clinical data

WILF'05 Proceedings of the 6th international conference on Fuzzy Logic and Applications
MaskedPainter: Feature selection for microarray data analysis

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Class prediction and feature selection are two learning tasks that are strictly paired in the search of molecular profiles from microarray data. Researchers have become aware how easy it is to incur a selection bias effect, and complex validation setups are required to avoid overly optimistic estimates of the predictive accuracy of the models and incorrect gene selections. This paper describes a semisupervised pattern discovery approach that uses the by-products of complete validation studies on experimental setups for gene profiling. In particular, we introduce the study of the patterns of single sample responses (sample-tracking profiles) to the gene selection process induced by typical supervised learning tasks in microarray studies. We originate sample-tracking profiles as the aggregated off-training evaluation of SVM models of increasing gene panel sizes. Genes are ranked by E-RFE, an entropy-based variant of the recursive feature elimination for support vector machines (RFE-SVM). A Dynamic Time Warping (DTW) algorithm is then applied to define a metric between sample-tracking profiles. An unsupervised clustering based on the DTW metric allows automating the discovery of outliers and of subtypes of different molecular profiles. Applications are described on synthetic data and in two gene expression studies.