An Efficient Digital Search Algorithm by Using a Double-Array Structure
IEEE Transactions on Software Engineering
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Language independent morphological analysis
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Chunking with support vector machines
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Fast methods for kernel-based text analysis
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Boosting precision and recall of dictionary-based protein name recognition
BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
Effective adaptation of a Hidden Markov Model-based named entity recognizer for biomedical domain
BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
Bio-medical entity extraction using Support Vector Machines
BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
The GENIA corpus: an annotated research abstract corpus in molecular biology domain
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Introduction: named entity recognition in biomedicine
Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Hi-index | 0.00 |
Protein name recognition aims to detect each and every protein names appearing in a PubMed abstract. The task is not simple, as the graphic word boundary (space separator) assumed in conventional preprocessing does not necessarily coincide with the protein name boundary. Such boundary disagreement caused by tokenization ambiguity has usually been ignored in conventional preprocessing of general English. In this paper, we argue that boundary disagreement poses serious limitations in biomedical English text processing, not to mention protein name recognition. Our key idea for dealing with the boundary disagreement is to apply techniques used in Japanese morphological analysis where there are no word boundaries. Having evaluated the proposed method with GENIA corpus 3.02, we obtain F-measure of 69.01 on a strict criterion and 79.32 on a relaxed criterion. The result is comparable to other published work in protein name recognition, without resorting to manually prepared ad hoc feature engineering. Further, compared to the conventional preprocessing, the use of morphological analysis as preprocessing improves the performance of protein name recognition and reduces the execution time.