An exploration of mining gene expression mentions and their anatomical locations from biomedical text

  • Authors:
  • Martin Gerner;Goran Nenadic;Casey M. Bergman

  • Affiliations:
  • University of Manchester, Manchester, UK;University of Manchester, Manchester, UK;University of Manchester, Manchester, UK

  • Venue:
  • BioNLP '10 Proceedings of the 2010 Workshop on Biomedical Natural Language Processing
  • Year:
  • 2010

Quantified Score

Hi-index 0.01

Visualization

Abstract

Here we explore mining data on gene expression from the biomedical literature and present Gene Expression Text Miner (GETM), a tool for extraction of information about the expression of genes and their anatomical locations from text. Provided with recognized gene mentions, GETM identifies mentions of anatomical locations and cell lines, and extracts text passages where authors discuss the expression of a particular gene in specific anatomical locations or cell lines. This enables the automatic construction of expression profiles for both genes and anatomical locations. Evaluated against a manually extended version of the BioNLP '09 corpus, GETM achieved precision and recall levels of 58.8% and 23.8%, respectively. Application of GETM to MEDLINE and PubMed Central yielded over 700,000 gene expression mentions. This data set may be queried through a web interface, and should prove useful not only for researchers who are interested in the developmental regulation of specific genes of interest, but also for data base curators aiming to create structured repositories of gene expression information. The compiled tool, its source code, the manually annotated evaluation corpus and a search query interface to the data set extracted from MEDLINE and PubMed Central is available at http://getmproject.sourceforge.net/.