A shared task involving multi-label classification of clinical free text

  • Authors:
  • John P. Pestian;Christopher Brew;Paweł Matykiewicz;D. J. Hovermale;Neil Johnson;K. Bretonnel Cohen;Włodzisław Duch

  • Affiliations:
  • University of Cincinnati;Ohio State University;University of Cincinnati and Nicolaus Copernicus University, Toruń, Poland;Ohio State University;University of Cincinnati;University of Colorado;Nicolaus Copernicus University, Toruń, Poland

  • Venue:
  • BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper reports on a shared task involving the assignment of ICD-9-CM codes to radiology reports. Two features distinguished this task from previous shared tasks in the biomedical domain. One is that it resulted in the first freely distributable corpus of fully anonymized clinical text. This resource is permanently available and will (we hope) facilitate future research. The other key feature of the task is that it required categorization with respect to a large and commercially significant set of labels. The number of participants was larger than in any previous biomedical challenge task. We describe the data production process and the evaluation measures, and give a preliminary analysis of the results. Many systems performed at levels approaching the inter-coder agreement, suggesting that human-like performance on this task is within the reach of currently available technologies.