Identification of gene function using prediction by partial matching (PPM) language models

  • Authors:
  • Malika Mahoui;William John Teahan;Arvind Kumar Thirumalaiswamy Sekhar;Satyasaibabu Chilukuri

  • Affiliations:
  • IUPUI, Indianapolis, IN, USA;University of Wales, Bangor, Wales, United Kngdm;Dow AgroSciences, Indianapolis, IN, USA;IUPUI, Indianapolis, IN, USA

  • Venue:
  • Proceedings of the 17th ACM conference on Information and knowledge management
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we describe the utilization of text encoding and prediction by partial matching language modeling to identify gene functions within abstracts of biomedical papers. The National Center for Biotechnology Information has "GeneRIF" - a collection of the best possible functional representations for a subset of abstracts from PubMed. We use GeneRIF to test the efficiency of our technique. We discuss the methodology adopted to construct models necessary to enable the Text Mining Toolkit to distinguish between gene functions and the rest of the abstract (non gene functions). We also describe the similarity based approach we deploy on the list of automatically annotated functions to generate the most likely gene function representative of the paper. The results indicate that our combined approach to identify gene functions in scientific abstracts performs very well on both precision and recall, and therefore presents exciting opportunities for use in extracting other entities embedded in scientific text.