Classification techniques with minimal labelling effort and application to medical reports

  • Authors:
  • Fathi H. Saad;G. Duncan Bell;Beatriz De la Iglesia

  • Affiliations:
  • School of Computing Sciences, University of East Anglia, Norwich NR4 7TJ, UK.;Endoscopy Unit, Norwich and Norfolk University Hospital, Colney Lane, Norwich NR4 7UY, UK.;School of Computing Sciences, University of East Anglia, Norwich NR4 7TJ, UK

  • Venue:
  • International Journal of Data Mining and Bioinformatics
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

There are a number of approaches to classify text documents. Here, we use Partially Supervised Classification (PSC) and argue that it is an effective and efficient approach for real-world problems. PSC uses a two-step strategy to cut down on the labelling effort. There are a number of methods that have been proposed for each step. An evaluation of various methods is conducted using real-world medical documents. The results show that using EM to build the classifier yields better results than SVM. We also experimentally show that careful selection of a subset of features to represent the documents can improve performance.