Contrast and variability in gene names

  • Authors:
  • K. Bretonnel Cohen;George K. Acquaah-Mensah;Andrew E. Dolbey;Lawrence Hunter

  • Affiliations:
  • University of Colorado;University of Colorado;University of Colorado;University of Colorado

  • Venue:
  • BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

We studied contrast and variability in a corpus of gene names to identify potential heuristics for use in performing entity identification in the molecular biology domain. Based on our findings, we developed heuristics for mapping weakly matching gene names to their official gene names. We then tested these heuristics against a large body of Medline abstracts, and found that using these heuristics can increase recall, with varying levels of precision. Our findings also underscored the importance of good information retrieval and of the ability to disambiguate between genes, proteins, RNA, and a variety of other referents for performing entity identification with high precision.