A formal framework for linguistic annotation
Speech Communication - Special issue on speech annotation and corpus tools
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
International standard for a linguistic annotation framework
Natural Language Engineering
GrAF: a graph-based format for linguistic annotations
LAW '07 Proceedings of the Linguistic Annotation Workshop
Bridging the gaps: interoperability for GrAF, GATE, and UIMA
ACL-IJCNLP '09 Proceedings of the Third Linguistic Annotation Workshop
Journal of Biomedical Informatics
Hi-index | 0.00 |
An increasing need for collaboration and resources sharing in the Natural Language Processing (NLP) research and development community motivates efforts to create and share a common data model and a common terminology for all information annotated and extracted from clinical text. We have combined two existing standards: the HL7 Clinical Document Architecture (CDA), and the ISO Graph Annotation Format (GrAF; in development), to develop such a data model entitled ''CDA+GrAF''. We experimented with several methods to combine these existing standards, and eventually selected a method wrapping separate CDA and GrAF parts in a common standoff annotation (i.e., separate from the annotated text) XML document. Two use cases, clinical document sections, and the 2010 i2b2/VA NLP Challenge (i.e., problems, tests, and treatments, with their assertions and relations), were used to create examples of such standoff annotation documents, and were successfully validated with the XML schemata provided with both standards. We developed a tool to automatically translate annotation documents from the 2010 i2b2/VA NLP Challenge format to GrAF, and automatically generated 50 annotation documents using this tool, all successfully validated. Finally, we adapted the XSL stylesheet provided with HL7 CDA to allow viewing annotation XML documents in a web browser, and plan to adapt existing tools for translating annotation documents between CDA+GrAF and the UIMA and GATE frameworks. This common data model may ease directly comparing NLP tools and applications, combining their output, transforming and ''translating'' annotations between different NLP applications, and eventually ''plug-and-play'' of different modules in NLP applications.