Enhancing performance of protein name recognizers using collocation

Authors:
Wen-Juan Hou;Hsin-Hsi Chen
Affiliations:
National Taiwan University, Taipei, Taiwan;National Taiwan University, Taipei, Taiwan
Venue:
BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
Year:
2003

Citing 8
Cited 5

Lexical analysis and stoplists

Information retrieval
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Constructing Biological Knowledge Bases by Extracting Information from Text Sources

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
The GENIA project: corpus-based knowledge acquisition and information extraction from genome research papers

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Extracting the names of genes and gene products with a hidden Markov model

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Identification and classification of proper nouns in Chinese texts

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Notions of correctness when evaluating protein name taggers

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1

Enhancing performance of protein and gene name recognizers with filtering and integration strategies

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Term identification in the biomedical literature

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Biomedical knowledge navigation by literature clustering

Journal of Biomedical Informatics
Annotating multiple types of biomedical entities: a single word classification approach

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Empirical textual mining to protein entities recognition from pubmed corpus

NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Named entity recognition is a fundamental task in biological relationship mining. This paper employs protein collocates extracted from a biological corpus to enhance the performance of protein name recognizers. Yapex and KeX are taken as examples. The precision of Yapex is increased from 70.90% to 81.94% at the low expense of recall rate (i.e., only decrease 2.39%) when collocates are incorporated. We also integrate the results proposed by Yapex and KeX, and employs collocates to filter the merged results. Because the candidates suggested by these two systems may be inconsistent, i.e., overlap in partial, one of them is considered as a basis. The experiments show that Yapex-based integration is better than KeX-based integration.