BoosTexter: A Boosting-based Systemfor Text Categorization
Machine Learning - Special issue on information retrieval
Hierarchically Classifying Documents Using Very Few Words
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
The VLDB Journal — The International Journal on Very Large Data Bases
An introduction to boosting and leveraging
Advanced lectures on machine learning
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Support vector machines classification with a very large-scale taxonomy
ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Boosting multi-label hierarchical text categorization
Information Retrieval
Deep classification in large-scale text hierarchies
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Quantifying counts and costs via classification
Data Mining and Knowledge Discovery
Large scale multi-label classification via metalabeler
Proceedings of the 18th international conference on World wide web
Quantification and semi-supervised classification methods for handling changes in class distribution
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Refined experts: improving classification in large taxonomies
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Quantification via Probability Estimators
ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
MP-Boost: a multiple-pivot boosting algorithm and its application to text categorization
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Hi-index | 12.05 |
We apply hierarchical supervised learning technology to the problem of assigning codes from the well-known ACR Index (a ''double-hierarchy'' classification scheme from the American College of Radiology) to radiology reports. This task is actually two classification tasks in one: the former uses a first hierarchy of codes describing anatomic locations, and the latter uses a second hierarchy of codes describing pathologies, where the two hierarchies are closely intertwined. A requirement of each such classification task is that the document be placed in exactly one node of depth =2 of the ''anatomic location'' hierarchy and in exactly one node of depth =3 of the ''pathology'' hierarchy; this makes our task a (fairly uncommon) variable-constraint classification task, since at the first levels of the hierarchy (2 for anatomic location, 3 for pathology) we need to use a standard ''exactly 1 class per document'' constraint, while at the lower levels we need to use an ''at most 1 class per document'' constraint. We have used a large dataset of about 250,000 radiology reports written in Italian and an adaptation of our TreeBoost.MH learning algorithm to variable-constraint classification. Notwithstanding the extreme difficulty of the task (given by the fact that the two codes had to be picked out of a pool of 719 codes for anatomic location and 5269 codes for pathology, respectively) our system displayed good accuracy, indicating that it may represent a viable tool for semi-automated classification of medical reports. We also analyzed the quantification accuracy of our system (i.e., the ability of the system at correctly estimating the frequency of the individual codes), a concern of special interest in epidemiology; the results show that our system has excellent quantification accuracy, making this system a valuable tool for the fully automated coding of radiology reports for epidemiological purposes.