Multi-label Associative Classification of Medical Documents from MEDLINE

  • Authors:
  • Rafal Rak;Lukasz Kurgan;Marek Reformat

  • Affiliations:
  • University of Alberta, Canada;University of Alberta, Canada;University of Alberta, Canada

  • Venue:
  • ICMLA '05 Proceedings of the Fourth International Conference on Machine Learning and Applications
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Ability to provide convenient access to scientific documents becomes a difficult problem due to large and constantly increasing number of incoming documents and extensive manual work associated with their storage, description and classification. This requires intelligent search and classification capabilities for users to find required information. It is especially true for repositories of scientific medical articles due to their extensive use, large size and number of new documents, and well maintained structure. This research aims to provide an automated method for classification of articles into the structure of medical document repositories, which would support currently performed extensive manual work. The proposed method classifies articles from the largest medical repository, MEDLINE, using state of the art data mining technology. The method is based on a novel associative classification technique which considers recurrent items and most importantly multi-label characteristic of the MEDLINE data. Based on large scale experiments that utilize 350,000 documents several different classification algorithms have been compared including both recurrent and non-recurrent associative classification. The algorithms are capable of assigning each medical document to several classes (multi-label classification) and are characterized by relatively high accuracy. We also investigate different measures of classification quality and point out pros and cons of each. Based on experimental result we show that recurrent item based associative classification demonstrates superior performance and propose three alternative setups that allow the user to obtain different de- sired classification qualities.