A text-mining technique for extracting gene-disease associations from the biomedical literature

  • Authors:
  • Hisham Al-Mubaid;Rajit K. Singh

  • Affiliations:
  • School of Science and Computer Engineering, University of Houston-Clear Lake, 2700 Bay Area Blvd, Box 40, Houston, Texas 77058, USA.;School of Science and Computer Engineering, University of Houston-Clear Lake, 2700 Bay Area Blvd, Box 40, Houston, Texas 77058, USA

  • Venue:
  • International Journal of Bioinformatics Research and Applications
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a new text mining technique to identify associations between biological entities, specifically genes-diseases associations, from the biomedical literature. The proposed method is very simple and straightforward; it uses two sets (a positive set and a negative set) of documents and utilises the concepts of expectation (ex), evidence (ev), and Z-scores in combining positive and negative evidences in determining the significant gene-disease associations from Medline documents. Moreover, the method offers an efficient way to handle gene names, aliases, symbols, and abbreviations. We evaluated the method in discovering gene-to-disease associations from literature and the experimental results are impressive. We verified our results and confirmed the effectiveness of the proposed technique by various ways. For example, we ran the technique on some discovered and known genes-diseases relationships. Our method was able to discover associations between genes and various diseases like Amyotrophic lateral sclerosis, Tuberous Sclerosis, Autism, Homocystinuria, Bipolar Disorder, Atherosclerosis and more.