Knowtator: a protégé plug-in for annotated corpus construction
NAACL-Demonstrations '06 Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume: demonstrations
Data quality from crowdsourcing: a study of annotation selection criteria
HLT '09 Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing
Temporal annotation of clinical text
BioNLP '08 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
ConText: an algorithm for identifying contextual features from clinical text
BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Building a semantically annotated corpus of clinical texts
Journal of Biomedical Informatics
Proceedings of the international conference on Multimedia information retrieval
Automatically constructing a dictionary for information extraction tasks
AAAI'93 Proceedings of the eleventh national conference on Artificial intelligence
Influence of pre-annotation on POS-tagged corpus development
LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
Semi-automatic semantic annotation of PubMed queries: A study on quality, efficiency, satisfaction
Journal of Biomedical Informatics
Hi-index | 0.00 |
Manually annotating clinical document corpora to generate reference standards for Natural Language Processing (NLP) systems or Machine Learning (ML) is a time-consuming and labor-intensive endeavor. Although a variety of open source annotation tools currently exist, there is a clear opportunity to develop new tools and assess functionalities that introduce efficiencies into the process of generating reference standards. These features include: management of document corpora and batch assignment, integration of machine-assisted verification functions, semi-automated curation of annotated information, and support of machine-assisted pre-annotation. The goals of reducing annotator workload and improving the quality of reference standards are important considerations for development of new tools. An infrastructure is also needed that will support large-scale but secure annotation of sensitive clinical data as well as crowdsourcing which has proven successful for a variety of annotation tasks. We introduce the Extensible Human Oracle Suite of Tools (eHOST) http://code.google.com/p/ehost that provides such functionalities that when coupled with server integration offer an end-to-end solution to carry out small or large scale as well as crowd sourced annotation projects.