Using heuristics, syntax and a local dynamic dictionary for protein name tagging

Authors:
Gunnar Eriksson;Kristofer Franzén;Fredrik Olsson
Affiliations:
Swedish Institute of Computer Science, Kista, Sweden;Swedish Institute of Computer Science, Kista, Sweden;Swedish Institute of Computer Science, Kista, Sweden
Venue:
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Year:
2002

Citing 5
Cited 0

Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
A non-projective dependency parser

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
The GENIA project: corpus-based knowledge acquisition and information extraction from genome research papers

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Notions of correctness when evaluating protein name taggers

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Building an annotated corpus in the molecular-biology domain

Proceedings of the COLING-2000 Workshop on Semantic Annotation and Intelligent Content

Quantified Score

Hi-index	0.00

Visualization

Abstract

A prerequisite for all higher level information extraction tasks is the identification of unknown names in text. This paper presents a method for extracting protein names from abstracts of articles in the biomedical domain. These names present several interesting difficulties because of their variant structural characteristics and the lack of common naming standards and fixed nomenclatures in the domain. The method presented is based on three main components: The use of heuristic rules and knowledge bases, the use of syntactic analysis, and the use of a document-local dynamic dictionary. The implementation of the method, the protein name tagger Yapex, is evaluated on a set of MEDLINE abstracts, using KeX, a freely available protein name tagger, as a reference system. The systems are evaluated under two different notions of correctness, and are shown to perform equally well under the more liberal notion, while the Yapex system performs significantly better under the stricter notion of correctness.