Automatic acquisition of huge training data for bio-medical named entity recognition

Authors:
Yu Usami;Han-Cheol Cho;Naoaki Okazaki;Jun'ichi Tsujii
Affiliations:
The University of Tokyo, Tokyo, Japan;The University of Tokyo, Tokyo, Japan;Tohoku University, Sendai, Japan;Microsoft Research Asia, Beijing, China
Venue:
BioNLP '11 Proceedings of BioNLP 2011 Workshop
Year:
2011

Citing 14
Cited 0

Support vector machine active learning with applications to text classification

The Journal of Machine Learning Research
Efficient support vector classifiers for named entity recognition

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
One sense per discourse

HLT '91 Proceedings of the workshop on Speech and Natural Language
Tuning support vector machines for biomedical named entity recognition

BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3
LIBLINEAR: A Library for Large Linear Classification

The Journal of Machine Learning Research
Web-scale named entity recognition

Proceedings of the 17th ACM conference on Information and knowledge management
Biomedical named entity recognition using conditional random fields and rich feature sets

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Bootstrapping and evaluating named entity recognition in the biomedical domain

BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
Incorporating GENETAG-style annotation to GENIA corpus

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Sparse Online Learning via Truncated Gradient

The Journal of Machine Learning Research
Design challenges and misconceptions in named entity recognition

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Inducing domain-specific semantic class taggers from (almost) nothing

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Adapting self-training for semantic role labeling

ACLstudent '10 Proceedings of the ACL 2010 Student Research Workshop
Developing a robust part-of-speech tagger for biomedical text

PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Named Entity Recognition (NER) is an important first step for BioNLP tasks, e.g., gene normalization and event extraction. Employing supervised machine learning techniques for achieving high performance recent NER systems require a manually annotated corpus in which every mention of the desired semantic types in a text is annotated. However, great amounts of human effort is necessary to build and maintain an annotated corpus. This study explores a method to build a high-performance NER without a manually annotated corpus, but using a comprehensible lexical database that stores numerous expressions of semantic types and with huge amount of unannotated texts. We underscore the effectiveness of our approach by comparing the performance of NERs trained on an automatically acquired training data and on a manually annotated corpus.