CIE-9-MC Code Classification with knn and SVM

  • Authors:
  • David Lojo;David E. Losada;Álvaro Barreiro

  • Affiliations:
  • IRLab. Dep. de Computación, Universidade da Coruña, Spain and Servicio de Informática, Complexo Hospitalario Universitario de Santiago, Santiago de Compostela, Spain;Grupo de Sistemas Inteligentes, Dep. de Electrónica y Computación, Universidade de Santiago de Compostela, Spain;IRLab. Dep. de Computación, Universidade da Coruña, Spain

  • Venue:
  • IWINAC '09 Proceedings of the 3rd International Work-Conference on The Interplay Between Natural and Artificial Computation: Part II: Bioinspired Applications in Artificial and Natural Computation
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

This paper is concerned with automatic classification of texts in a medical domain. The process consists in classifying reports of medical discharges into classes defined by the CIE-9-MC codes. We will assign CIE-9-MC codes to reports using either a knn model or support vector machines. One of the added values of this work is the construction of the collection using the discharge reports of a medical service. This is a difficult collection because of the high number of classes and the uneven balance between classes. In this work we study different representations of the collection, different classication models, and different weighting schemes to assign CIE-9-MC codes. Our use of document expansion is particularly novel: the training documents are expanded with the descriptions of the assigned codes taken from CIE-9-MC . We also apply SVMs to produce a ranking of classes for each test document. This innovative use of SVM offers good results in such a complicated domain.