Weighted kernel Fisher discriminant analysis for integrating heterogeneous data

Authors:
Jemila S. Hamid;Celia M. T. Greenwood;Joseph Beyene
Affiliations:
Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Canada and Pathology and Molecular Medicine, McMaster University, Hamilton, Canada;Lady Davis Research Institute, Jewish General Hospital, Montreal, Canada;Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Canada and Pathology and Molecular Medicine, McMaster University, Hamilton, Canada
Venue:
Computational Statistics & Data Analysis
Year:
2012

Citing 5
Cited 1

The nature of statistical learning theory

The nature of statistical learning theory
A statistical framework for genomic data fusion

Bioinformatics
Kernel methods for predicting protein--protein interactions

Bioinformatics
Bagging Based Efficient Kernel Fisher Discriminant Analysis for Face Recognition

ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 03
Large Scale Multiple Kernel Learning

The Journal of Machine Learning Research

Asymmetric least squares support vector machine classifiers

Computational Statistics & Data Analysis

Quantified Score

Hi-index	0.03

Visualization

Abstract

Data integration is becoming an essential tool to cope with and make sense of the ever increasing amount of biological data. Genomic data arises in various shapes and forms including vectors, graphs or sequences, therefore, it is essential to carefully consider strategies that best capture the most information contained in each data type. The need for integration of heterogeneous data measured on the same individuals arises in a wide range of clinical applications as well. We propose weighted kernel Fisher discriminant (wKFD) analysis for integrating heterogeneous data sets. We use weights that measure relative importance of each of the data sets to be integrated. Simulation studies are conducted to assess performance of our proposed method. The results show that our method performs very well including in the presence of noisy data. We also illustrate our method using gene expression and clinical data from breast cancer patients. Weighted integration of heterogeneous data leads to improved predictive accuracy. The amount of improvement, however, depends on the quality and informativity of each of the data sets being integrated. If a data set is of poor quality and/or non-informative, one should not expect a significant improvement by adding this particular data set to other informative data sets. Likewise, important improvement might not be obtained if data do not contain independent information, that is, if there is redundancy in the data.