The value of parsing as feature generation for gene mention recognition

  • Authors:
  • Larry H. Smith;W. John Wilbur

  • Affiliations:
  • Computational Biology Branch, National Center for Biotechnology Information, Room 6S614-N, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA;Computational Biology Branch, National Center for Biotechnology Information, Room 6S606, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA

  • Venue:
  • Journal of Biomedical Informatics
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We measured the extent to which information surrounding a base noun phrase reflects the presence of a gene name, and evaluated seven different parsers in their ability to provide information for that purpose. Using the GENETAG corpus as a gold standard, we performed machine learning to recognize from its context when a base noun phrase contained a gene name. Starting with the best lexical features, we assessed the gain of adding dependency or dependency-like relations from a full sentence parse. Features derived from parsers improved performance in this partial gene mention recognition task by a small but statistically significant amount. There were virtually no differences between parsers in these experiments.