An Experiment in Automatic Classification of Pathological Reports

Authors:
Janneke Zwaan;Erik Tjong Kim Sang;Maarten Rijke
Affiliations:
ISLA, University of Amsterdam, Amsterdam, The Netherlands;ISLA, University of Amsterdam, Amsterdam, The Netherlands;ISLA, University of Amsterdam, Amsterdam, The Netherlands
Venue:
AIME '07 Proceedings of the 11th conference on Artificial Intelligence in Medicine
Year:
2007

Citing 6
Cited 1

Consistency in the selection of search concepts and search terms

Information Processing and Management: an International Journal
Making large-scale support vector machine learning practical

Advances in kernel methods
An experimental study in auomatically categorizing medical documents

Journal of the American Society for Information Science and Technology
Medical Language Processing: Computer Management of Narrative Data

Medical Language Processing: Computer Management of Narrative Data
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Multiple hierarchical classification of free-text clinical guidelines

Artificial Intelligence in Medicine

The influence of collocation segmentation and top 10 items to keyword assignment performance

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Medical reports are predominantly written in natural language; as such they are not computer-accessible. A common way to make medical narrative accessible to automated systems is by assigning `computer-understandable' keywords from a controlled vocabulary. Experts usually perform this task by hand. In this paper, we investigate methods to support or automate this type of medical classification. We report on experiments using the PALGA data set, a collection of 14 million pathological reports, each of which has been classified by a domain expert. We describe methods for automatically categorizing the documents in this data set in an accurate way. In order to evaluate the proposed automatic classification approaches, we compare their output with that of two additional human annotators. While the automatic system performs well in comparison with humans, the inconsistencies within the annotated data constrain the maximum attainable performance.