Using heuristics, syntax and a local dynamic dictionary for protein name tagging

  • Authors:
  • Gunnar Eriksson;Kristofer Franzén;Fredrik Olsson

  • Affiliations:
  • Swedish Institute of Computer Science, Kista, Sweden;Swedish Institute of Computer Science, Kista, Sweden;Swedish Institute of Computer Science, Kista, Sweden

  • Venue:
  • HLT '02 Proceedings of the second international conference on Human Language Technology Research
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

A prerequisite for all higher level information extraction tasks is the identification of unknown names in text. This paper presents a method for extracting protein names from abstracts of articles in the biomedical domain. These names present several interesting difficulties because of their variant structural characteristics and the lack of common naming standards and fixed nomenclatures in the domain. The method presented is based on three main components: The use of heuristic rules and knowledge bases, the use of syntactic analysis, and the use of a document-local dynamic dictionary. The implementation of the method, the protein name tagger Yapex, is evaluated on a set of MEDLINE abstracts, using KeX, a freely available protein name tagger, as a reference system. The systems are evaluated under two different notions of correctness, and are shown to perform equally well under the more liberal notion, while the Yapex system performs significantly better under the stricter notion of correctness.