Semi-supervised text categorization: Exploiting unlabeled data using ensemble learning algorithms

  • Authors:
  • Mohammad Reza Keyvanpour;Maryam Bahojb Imani

  • Affiliations:
  • Department of Computer Engineering, Alzahra University, Tehran, Iran;Department of Computer Engineering, Alzahra University, Tehran, Iran

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text categorization is one of the fundamental tasks in text mining. Classical supervised methods need lot of labeled data to train a classifier. Since assigning labels to the large amount of data is very costly and time consuming, it is useful to use data sets without labels. So many different semi-supervised learning methods have been studied recently. Among these semi-supervised methods, self-training is one of the important learning algorithms that classifies unlabeled samples with small amount of labeled ones and adds the most confident samples to the training set. In this paper, dynamic weighting beside majority vote approach is applied to classify the unlabeled data to reliable and unreliable classes. Then, the reliable data are added to the training set and the remaining data including unreliable data are classified in iterative process. We tested this method on the extracted features of ten common Reuter-21578 classes. Experimental result indicates that proposed method improves the classification performance and it's effective.