Assessing agreement on classification tasks: the kappa statistic
Computational Linguistics
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Information Retrieval
Gene name identification and normalization using a model organism database
Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Investigating GIS and smoothing for maximum entropy taggers
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Multi-criteria-based active learning for named entity recognition
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Hi-index | 0.00 |
We demonstrate that bootstrapping a gene name recognizer for FlyBase curation from automatically annotated noisy text is more effective than fully supervised training of the recognizer on more general manually annotated biomedical text. We present a new test set for this task based on an annotation scheme which distinguishes gene names from gene mentions, enabling a more consistent annotation. Evaluating our recognizer using this test set indicates that performance on unseen genes is its main weakness. We evaluate extensions to the technique used to generate training data designed to ameliorate this problem.