Identifying gene-disease associations using centrality on a literature mined gene-interaction network

Authors:
Arzucan Özgür;Thuy Vu;Güneş Erkan;Dragomir R. Radev
Affiliations:
-;-;-;-
Venue:
Bioinformatics
Year:
2008

Citing 0
Cited 6

Ontologizing concept maps using graph theory

Proceedings of the 2011 ACM Symposium on Applied Computing
Towards open ontology learning and filtering

Information Systems
Identifying disease diagnosis factors by proximity-based mining of medical texts

ACIIDS'11 Proceedings of the Third international conference on Intelligent information and database systems - Volume Part II
Integrating large, disparate biomedical ontologies to boost organ development network connectivity

DILS'12 Proceedings of the 8th international conference on Data Integration in the Life Sciences
Improving Protein-Protein Interaction Pair Ranking with an Integrated Global Association Score

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Top-k Similar Graph Matching Using TraM in Biological Networks

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Understanding the role of genetics in diseases is one of the most important aims of the biological sciences. The completion of the Human Genome Project has led to a rapid increase in the number of publications in this area. However, the coverage of curated databases that provide information manually extracted from the literature is limited. Another challenge is that determining disease-related genes requires laborious experiments. Therefore, predicting good candidate genes before experimental analysis will save time and effort. We introduce an automatic approach based on text mining and network analysis to predict gene-disease associations. We collected an initial set of known disease-related genes and built an interaction network by automatic literature mining based on dependency parsing and support vector machines. Our hypothesis is that the central genes in this disease-specific network are likely to be related to the disease. We used the degree, eigenvector, betweenness and closeness centrality metrics to rank the genes in the network. Results: The proposed approach can be used to extract known and to infer unknown gene-disease associations. We evaluated the approach for prostate cancer. Eigenvector and degree centrality achieved high accuracy. A total of 95% of the top 20 genes ranked by these methods are confirmed to be related to prostate cancer. On the other hand, betweenness and closeness centrality predicted more genes whose relation to the disease is currently unknown and are candidates for experimental study. Availability: A web-based system for browsing the disease-specific gene-interaction networks is available at: http://gin.ncibi.org Contact: radev@umich.edu