Predictive learning with sparse heterogeneous data

Authors:
Vladimir Cherkassky;Feng Cai;Lichen Liang
Affiliations:
Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN;Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN;Massachusetts General Hospital, Boston, MA
Venue:
IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Year:
2009

Citing 5
Cited 0

A theoretical framework for learning from a pool of disparate data sources

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Regularized multi--task learning

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics)

Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics)
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data

The Journal of Machine Learning Research
SVM+ regression and multi-task learning

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many applications of machine learning involve sparse and heterogeneous data. For example, estimation of predictive (diagnostic) models using patients' data from clinical studies requires effective integration of genetic, clinical and demographic data. Typically all heterogeneous inputs are properly encoded and mapped onto a single feature vector, used for estimating (training) a predictive model. This approach, known as standard inductive learning, is used in most application studies. More recently, several new learning methodologies have emerged. In particular, when training data can be naturally separated into several groups (or structured), we can view learning (estimation) for each group as a separate task, leading to Multi-Task Learning framework. Similarly, a setting where training data is structured, but the objective is to estimate a single predictive model (for all groups), leads to Learning with Structured Data and SVM+ methodology recently proposed by Vapnik. This paper demonstrates advantages and limitations of these new data modeling approaches for modeling heterogeneous data (relative to standard inductive SVM) via empirical comparisons using several publicly available medical data sets.