Dynamically generating a protein entity dictionary using online resources

  • Authors:
  • Hongfang Liu;Zhangzhi Hu;Cathy Wu

  • Affiliations:
  • University of Maryland, Baltimore, MD;Georgetown University Medical Center, Washington, DC;Georgetown University Medical Center, Washington, DC

  • Venue:
  • ACLdemo '05 Proceedings of the ACL 2005 on Interactive poster and demonstration sessions
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the overwhelming amount of biological knowledge stored in free text, natural language processing (NLP) has received much attention recently to make the task of managing information recorded in free text more feasible. One requirement for most NLP systems is the ability to accurately recognize biological entity terms in free text and the ability to map these terms to corresponding records in databases. Such task is called biological named entity tagging. In this paper, we present a system that automatically constructs a protein entity dictionary, which contains gene or protein names associated with UniProt identifiers using online resources. The system can run periodically to always keep up-to-date with these online resources. Using online resources that were available on Dec. 25, 2004, we obtained 4,046,733 terms for 1,640,082 entities. The dictionary can be accessed from the following website: http://biocreative.ifsm.umbc.edu/biothesaurus/.