Kernel-based data fusion for gene prioritization

Authors:
Tijl De Bie;Léon-Charles Tranchevent;Liesbeth M. M. van Oeffelen;Yves Moreau
Affiliations:
-;-;-;-
Venue:
Bioinformatics
Year:
2007

Citing 0
Cited 7

Identifying biologically relevant genes via multiple heterogeneous data sources

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Evolutionary Optimization of Kernel Weights Improves Protein Complex Comembership Prediction

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Ranking genes based on kernels

Intelligent Decision Technologies - Special issue on advances in medical intelligent decision support systems
Multiple Kernel Learning Algorithms

The Journal of Machine Learning Research
Improved modeling of clinical data with kernel methods

Artificial Intelligence in Medicine
A bagging SVM to learn from positive and unlabeled examples

Pattern Recognition Letters
Prioritizing Disease Genes and Understanding Disease Pathways

International Journal of Knowledge Discovery in Bioinformatics

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Hunting disease genes is a problem of primary importance in biomedical research. Biologists usually approach this problem in two steps: first a set of candidate genes is identified using traditional positional cloning or high-throughput genomics techniques; second, these genes are further investigated and validated in the wet lab, one by one. To speed up discovery and limit the number of costly wet lab experiments, biologists must test the candidate genes starting with the most probable candidates. So far, biologists have relied on literature studies, extensive queries to multiple databases and hunches about expected properties of the disease gene to determine such an ordering. Recently, we have introduced the data mining tool ENDEAVOUR (Aerts et al., 2006), which performs this task automatically by relying on different genome-wide data sources, such as Gene Ontology, literature, microarray, sequence and more. Results: In this article, we present a novel kernel method that operates in the same setting: based on a number of different views on a set of training genes, a prioritization of test genes is obtained. We furthermore provide a thorough learning theoretical analysis of the method's guaranteed performance. Finally, we apply the method to the disease data sets on which ENDEAVOUR (Aerts et al., 2006) has been benchmarked, and report a considerable improvement in empirical performance. Availability: The MATLAB code used in the empirical results will be made publicly available. Contact:tijl.debie@gmail.com or yves.moreau@esat.kuleuven.be