Accelerating the annotation of sparse named entities by dynamic sentence selection

Authors:
Yoshimasa Tsuruoka;Jun'ichi Tsujii;Sophia Ananiadou
Affiliations:
The University of Manchester, UK;The University of Manchester, UK and The University of Tokyo, Japan and National Centre for Text Mining (NaCTeM), Manchester, UK;The University of Manchester, UK and National Centre for Text Mining (NaCTeM), Manchester, UK
Venue:
BioNLP '08 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Year:
2008

Citing 12
Cited 4

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Active Learning for Natural Language Parsing and Information Extraction

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Minimizing manual annotation cost in supervised training from corpora

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Gene name identification and normalization using a model organism database

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Multi-criteria-based active learning for named entity recognition

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Improving the scalability of semi-Markov conditional random fields for named entity recognition

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Exponentiated gradient algorithms for log-linear structured prediction

Proceedings of the 24th international conference on Machine learning
Introduction to the bio-entity recognition task at JNLPBA

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Biomedical named entity recognition using conditional random fields and rich feature sets

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Bootstrapping and evaluating named entity recognition in the biomedical domain

BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
Reducing labeling effort for structured prediction tasks

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2

Methodological Review: What can natural language processing do for clinical decision support?

Journal of Biomedical Informatics
Association rules to identify receptor and ligand structures through named entities recognition

IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part III
Facilitating the analysis of discourse phenomena in an interoperable NLP platform

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Active learning for on-road vehicle detection: a comparative study

Machine Vision and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an active learning-like framework for reducing the human effort for making named entity annotations in a corpus. In this framework, the annotation work is performed as an iterative and interactive process between the human annotator and a probabilistic named entity tagger. At each iteration, sentences that are most likely to contain named entities of the target category are selected by the probabilistic tagger and presented to the annotator. This iterative annotation process is repeated until the estimated coverage reaches the desired level. Unlike active learning approaches, our framework produces a named entity corpus that is free from the sampling bias introduced by the active strategy. We evaluated our framework by simulating the annotation process using two named entity corpora and show that our approach could drastically reduce the number of sentences to be annotated when applied to sparse named entities.