Variable-constraint classification and quantification of radiology reports under the ACR Index

Authors:
Stefano Baccianella;Andrea Esuli;Fabrizio Sebastiani
Affiliations:
Istituto di Scienza e Tecnologie dell'Informazione, Consiglio Nazionale delle Ricerche, 56124 Pisa, Italy;Istituto di Scienza e Tecnologie dell'Informazione, Consiglio Nazionale delle Ricerche, 56124 Pisa, Italy;Istituto di Scienza e Tecnologie dell'Informazione, Consiglio Nazionale delle Ricerche, 56124 Pisa, Italy
Venue:
Expert Systems with Applications: An International Journal
Year:
2013

Citing 14
Cited 0

BoosTexter: A Boosting-based Systemfor Text Categorization

Machine Learning - Special issue on information retrieval
Hierarchically Classifying Documents Using Very Few Words

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies

The VLDB Journal — The International Journal on Very Large Data Bases
An introduction to boosting and leveraging

Advanced lectures on machine learning
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Support vector machines classification with a very large-scale taxonomy

ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Boosting multi-label hierarchical text categorization

Information Retrieval
Deep classification in large-scale text hierarchies

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Quantifying counts and costs via classification

Data Mining and Knowledge Discovery
Large scale multi-label classification via metalabeler

Proceedings of the 18th international conference on World wide web
Quantification and semi-supervised classification methods for handling changes in class distribution

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Refined experts: improving classification in large taxonomies

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Quantification via Probability Estimators

ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
MP-Boost: a multiple-pivot boosting algorithm and its application to text categorization

SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval

Quantified Score

Hi-index	12.05

Visualization

Abstract

We apply hierarchical supervised learning technology to the problem of assigning codes from the well-known ACR Index (a ''double-hierarchy'' classification scheme from the American College of Radiology) to radiology reports. This task is actually two classification tasks in one: the former uses a first hierarchy of codes describing anatomic locations, and the latter uses a second hierarchy of codes describing pathologies, where the two hierarchies are closely intertwined. A requirement of each such classification task is that the document be placed in exactly one node of depth =2 of the ''anatomic location'' hierarchy and in exactly one node of depth =3 of the ''pathology'' hierarchy; this makes our task a (fairly uncommon) variable-constraint classification task, since at the first levels of the hierarchy (2 for anatomic location, 3 for pathology) we need to use a standard ''exactly 1 class per document'' constraint, while at the lower levels we need to use an ''at most 1 class per document'' constraint. We have used a large dataset of about 250,000 radiology reports written in Italian and an adaptation of our TreeBoost.MH learning algorithm to variable-constraint classification. Notwithstanding the extreme difficulty of the task (given by the fact that the two codes had to be picked out of a pool of 719 codes for anatomic location and 5269 codes for pathology, respectively) our system displayed good accuracy, indicating that it may represent a viable tool for semi-automated classification of medical reports. We also analyzed the quantification accuracy of our system (i.e., the ability of the system at correctly estimating the frequency of the individual codes), a concern of special interest in epidemiology; the results show that our system has excellent quantification accuracy, making this system a valuable tool for the fully automated coding of radiology reports for epidemiological purposes.