Bootstrapping and evaluating named entity recognition in the biomedical domain

Authors:
Andreas Vlachos;Caroline Gasperin
Affiliations:
University of Cambridge, Cambridge, UK;University of Cambridge, Cambridge, UK
Venue:
BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
Year:
2006

Citing 7
Cited 9

Assessing agreement on classification tasks: the kappa statistic

Computational Linguistics
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Information Retrieval

Information Retrieval
Gene name identification and normalization using a model organism database

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Investigating GIS and smoothing for maximum entropy taggers

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Multi-criteria-based active learning for named entity recognition

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics

Active learning for anaphora resolution

HLT '09 Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing
Accelerating the annotation of sparse named entities by dynamic sentence selection

BioNLP '08 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Annotation of chemical named entities

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
BaseNPs that contain gene names: domain specificity and genericity

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Evaluating and combining biomedical named entity recognition systems

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Statistical anaphora resolution in biomedical texts

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Automatic acquisition of huge training data for bio-medical named entity recognition

BioNLP '11 Proceedings of BioNLP 2011 Workshop
A bootstrapping approach for training a NER with conditional random fields

EPIA'11 Proceedings of the 15th Portugese conference on Progress in artificial intelligence
Boosting the protein name recognition performance by bootstrapping on selected text

BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We demonstrate that bootstrapping a gene name recognizer for FlyBase curation from automatically annotated noisy text is more effective than fully supervised training of the recognizer on more general manually annotated biomedical text. We present a new test set for this task based on an annotation scheme which distinguishes gene names from gene mentions, enabling a more consistent annotation. Evaluating our recognizer using this test set indicates that performance on unseen genes is its main weakness. We evaluate extensions to the technique used to generate training data designed to ameliorate this problem.