Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions
Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
A non-projective dependency parser
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Notions of correctness when evaluating protein name taggers
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Building an annotated corpus in the molecular-biology domain
Proceedings of the COLING-2000 Workshop on Semantic Annotation and Intelligent Content
Hi-index | 0.00 |
A prerequisite for all higher level information extraction tasks is the identification of unknown names in text. This paper presents a method for extracting protein names from abstracts of articles in the biomedical domain. These names present several interesting difficulties because of their variant structural characteristics and the lack of common naming standards and fixed nomenclatures in the domain. The method presented is based on three main components: The use of heuristic rules and knowledge bases, the use of syntactic analysis, and the use of a document-local dynamic dictionary. The implementation of the method, the protein name tagger Yapex, is evaluated on a set of MEDLINE abstracts, using KeX, a freely available protein name tagger, as a reference system. The systems are evaluated under two different notions of correctness, and are shown to perform equally well under the more liberal notion, while the Yapex system performs significantly better under the stricter notion of correctness.