Role of centrality in network-based prioritization of disease genes

Authors:
Sinan Erten;Mehmet Koyutürk
Affiliations:
Dept. of Electrical Engineering S Computer Science, Case Western Reserve University, Cleveland, OH;Dept. of Electrical Engineering S Computer Science, Case Western Reserve University, Cleveland, OH
Venue:
EvoBIO'10 Proceedings of the 8th European conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics
Year:
2010

Citing 5
Cited 1

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps

Bioinformatics
Center-piece subgraphs: problem definition and fast solutions

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
SUSPECTS: enabling fast and effective prioritization of positional candidates

Bioinformatics
Random walk with restart: fast solutions and applications

Knowledge and Information Systems

Disease gene prioritization based on topological similarity in protein-protein interaction networks

RECOMB'11 Proceedings of the 15th Annual international conference on Research in computational molecular biology

Quantified Score

Hi-index	0.00

Visualization

Abstract

High-throughput molecular interaction data have been used effectively to prioritize candidate genes that are linked to a disease, based on the notion that the products of genes associated with similar diseases are likely to interact with each other heavily in a network of protein-protein interactions (PPIs). An important challenge for these applications, however, is the incomplete and noisy nature of PPI data. Random walk and network propagation based methods alleviate these problems to a certain extent, by considering indirect interactions and multiplicity of paths. However, as we demonstrate in this paper, such methods are likely to favor highly connected genes, making prioritization sensitive to the skewed degree distribution of PPI networks, as well as ascertainment bias in available interaction and disease association data. Here, we propose several statistical correction schemes that aim to account for the degree distribution of known disease and candidate genes. We show that, while the proposed schemes are very effective in detecting loosely connected disease genes that are missed by existing approaches, this improvement might come at the price of more false negatives for highly connected genes. Motivated by these results, we develop uniform prioritization methods that effectively integrate existing methods with the proposed statistical correction schemes. Comprehensive experimental results on the Online Mendelian Inheritance in Man (OMIM) database show that the resulting hybrid schemes outperform existing methods in prioritizing candidate disease genes.