Classification techniques with minimal labelling effort and application to medical reports

Authors:
Fathi H. Saad;G. Duncan Bell;Beatriz De la Iglesia
Affiliations:
School of Computing Sciences, University of East Anglia, Norwich NR4 7TJ, UK.;Endoscopy Unit, Norwich and Norfolk University Hospital, Colney Lane, Norwich NR4 7UY, UK.;School of Computing Sciences, University of East Anglia, Norwich NR4 7TJ, UK
Venue:
International Journal of Data Mining and Bioinformatics
Year:
2008

Citing 19
Cited 0

Evaluating text categorization

HLT '91 Proceedings of the workshop on Speech and Natural Language
The nature of statistical learning theory

The nature of statistical learning theory
An algorithm for suffix stripping

Readings in information retrieval
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Learning to classify text from labeled and unlabeled documents

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Partially Supervised Classification of Text Documents

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
PEBL: positive example based learning for Web page classification using SVM

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Combining clustering and co-training to enhance text classification using unlabelled data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Building Text Classifiers Using Positive and Unlabeled Examples

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
CBC: Clustering Based Text Classification Requiring Minimal Labeled Data

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
The SMART Retrieval System—Experiments in Automatic Document Processing

The SMART Retrieval System—Experiments in Automatic Document Processing
Learning to classify texts using positive and unlabeled data

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Beyond TFIDF weighting for text categorization in the vector space model

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Learning from positive and unlabeled examples with different data distributions

ECML'05 Proceedings of the 16th European conference on Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

There are a number of approaches to classify text documents. Here, we use Partially Supervised Classification (PSC) and argue that it is an effective and efficient approach for real-world problems. PSC uses a two-step strategy to cut down on the labelling effort. There are a number of methods that have been proposed for each step. An evaluation of various methods is conducted using real-world medical documents. The results show that using EM to build the classifier yields better results than SVM. We also experimentally show that careful selection of a subset of features to represent the documents can improve performance.