Building a semantically annotated corpus of clinical texts

Authors:
Angus Roberts;Robert Gaizauskas;Mark Hepple;George Demetriou;Yikun Guo;Ian Roberts;Andrea Setzer
Affiliations:
Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello, Sheffield S1 4DP, UK;Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello, Sheffield S1 4DP, UK;Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello, Sheffield S1 4DP, UK;Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello, Sheffield S1 4DP, UK;Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello, Sheffield S1 4DP, UK;Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello, Sheffield S1 4DP, UK;Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello, Sheffield S1 4DP, UK
Venue:
Journal of Biomedical Informatics
Year:
2009

Citing 14
Cited 10

Active Learning for Natural Language Parsing and Information Extraction

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
The evolution of Protégé: an environment for knowledge-based systems development

International Journal of Human-Computer Studies
GATE: an architecture for development of robust HLT applications

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Robust temporal processing of news

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
The CLEF Chronicle: Patient Histories Derived from Electronic Health Records

ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
Classifying semantic relations in bioscience texts

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Natural language processing to extract medical problems from electronic clinical documents: Performance evaluation

Journal of Biomedical Informatics
Multi-way relation classification: application to protein-protein interactions

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Knowtator: a protégé plug-in for annotated corpus construction

NAACL-Demonstrations '06 Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume: demonstrations
A shared task involving multi-label classification of clinical free text

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
SemEval-2007 task 15: TempEval temporal relation identification

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Automatically generating extraction patterns from untagged text

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
SVM based learning system for information extraction

Proceedings of the First international conference on Deterministic and Statistical Methods in Machine Learning
Overview of the ImageCLEFmed 2006 medical retrieval and medical annotation tasks

CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval

Guest Editorial: Current issues in biomedical text mining and natural language processing

Journal of Biomedical Informatics
Methodological Review: What can natural language processing do for clinical decision support?

Journal of Biomedical Informatics
Methodological Review: Text mining for traditional Chinese medical knowledge discovery: A survey

Journal of Biomedical Informatics
Towards morphologically annotated corpus of hospital discharge reports in Polish

BioNLP '11 Proceedings of BioNLP 2011 Workshop
Automatic semantic labeling of medical texts with feature structures

TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Automatically estimating the incidence of symptoms recorded in GP free text notes

Proceedings of the first international workshop on Managing interoperability and complexity in health systems
Methodological Review: Coreference resolution: A review of general methodologies and applications in the clinical domain

Journal of Biomedical Informatics
Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports

Journal of Biomedical Informatics
A prototype tool set to support machine-assisted annotation

BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
The DDI corpus: An annotated corpus with pharmacological substances and drug-drug interactions

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper, we describe the construction of a semantically annotated corpus of clinical texts for use in the development and evaluation of systems for automatically extracting clinically significant information from the textual component of patient records. The paper details the sampling of textual material from a collection of 20,000 cancer patient records, the development of a semantic annotation scheme, the annotation methodology, the distribution of annotations in the final corpus, and the use of the corpus for development of an adaptive information extraction system. The resulting corpus is the most richly semantically annotated resource for clinical text processing built to date, whose value has been demonstrated through its use in developing an effective information extraction system. The detailed presentation of our corpus construction and annotation methodology will be of value to others seeking to build high-quality semantically annotated corpora in biomedical domains.