A prototype tool set to support machine-assisted annotation

  • Authors:
  • Brett R. South;Shuying Shen;Jianwei Leng;Tyler B. Forbush;Scott L. DuVall;Wendy W. Chapman

  • Affiliations:
  • Biomedical Informatics and Internal Medicine;Biomedical Informatics and Internal Medicine;Internal Medicine;IDEAS Center SLCVA Healthcare System, Salt Lake City, Utah;Radiology University of Utah, Salt Lake City, Utah and IDEAS Center SLCVA Healthcare System, Salt Lake City, Utah;University of California, San Diego, La Jolla, California

  • Venue:
  • BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Manually annotating clinical document corpora to generate reference standards for Natural Language Processing (NLP) systems or Machine Learning (ML) is a time-consuming and labor-intensive endeavor. Although a variety of open source annotation tools currently exist, there is a clear opportunity to develop new tools and assess functionalities that introduce efficiencies into the process of generating reference standards. These features include: management of document corpora and batch assignment, integration of machine-assisted verification functions, semi-automated curation of annotated information, and support of machine-assisted pre-annotation. The goals of reducing annotator workload and improving the quality of reference standards are important considerations for development of new tools. An infrastructure is also needed that will support large-scale but secure annotation of sensitive clinical data as well as crowdsourcing which has proven successful for a variety of annotation tasks. We introduce the Extensible Human Oracle Suite of Tools (eHOST) http://code.google.com/p/ehost that provides such functionalities that when coupled with server integration offer an end-to-end solution to carry out small or large scale as well as crowd sourced annotation projects.