A prototype tool set to support machine-assisted annotation

Authors:
Brett R. South;Shuying Shen;Jianwei Leng;Tyler B. Forbush;Scott L. DuVall;Wendy W. Chapman
Affiliations:
Biomedical Informatics and Internal Medicine;Biomedical Informatics and Internal Medicine;Internal Medicine;IDEAS Center SLCVA Healthcare System, Salt Lake City, Utah;Radiology University of Utah, Salt Lake City, Utah and IDEAS Center SLCVA Healthcare System, Salt Lake City, Utah;University of California, San Diego, La Jolla, California
Venue:
BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
Year:
2012

Citing 9
Cited 0

Knowtator: a protégé plug-in for annotated corpus construction

NAACL-Demonstrations '06 Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume: demonstrations
Data quality from crowdsourcing: a study of annotation selection criteria

HLT '09 Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing
Temporal annotation of clinical text

BioNLP '08 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
ConText: an algorithm for identifying contextual features from clinical text

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Building a semantically annotated corpus of clinical texts

Journal of Biomedical Informatics
How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation

Proceedings of the international conference on Multimedia information retrieval
Automatically constructing a dictionary for information extraction tasks

AAAI'93 Proceedings of the eleventh national conference on Artificial intelligence
Influence of pre-annotation on POS-tagged corpus development

LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
Semi-automatic semantic annotation of PubMed queries: A study on quality, efficiency, satisfaction

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Manually annotating clinical document corpora to generate reference standards for Natural Language Processing (NLP) systems or Machine Learning (ML) is a time-consuming and labor-intensive endeavor. Although a variety of open source annotation tools currently exist, there is a clear opportunity to develop new tools and assess functionalities that introduce efficiencies into the process of generating reference standards. These features include: management of document corpora and batch assignment, integration of machine-assisted verification functions, semi-automated curation of annotated information, and support of machine-assisted pre-annotation. The goals of reducing annotator workload and improving the quality of reference standards are important considerations for development of new tools. An infrastructure is also needed that will support large-scale but secure annotation of sensitive clinical data as well as crowdsourcing which has proven successful for a variety of annotation tasks. We introduce the Extensible Human Oracle Suite of Tools (eHOST) http://code.google.com/p/ehost that provides such functionalities that when coupled with server integration offer an end-to-end solution to carry out small or large scale as well as crowd sourced annotation projects.