Semi-supervised text categorization: Exploiting unlabeled data using ensemble learning algorithms

Authors:
Mohammad Reza Keyvanpour;Maryam Bahojb Imani
Affiliations:
Department of Computer Engineering, Alzahra University, Tehran, Iran;Department of Computer Engineering, Alzahra University, Tehran, Iran
Venue:
Intelligent Data Analysis
Year:
2013

Citing 18
Cited 0

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Knowledge discovery in databases: an overview

AI Magazine
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Combining Pattern Classifiers: Methods and Algorithms

Combining Pattern Classifiers: Methods and Algorithms
Data extraction as text categorization: an experiment with the MUC-3 corpus

MUC3 '91 Proceedings of the 3rd conference on Message understanding
Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data

Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
High performance text document clustering

High performance text document clustering
A Survey of Semi-Supervised Learning Methods

CIS '08 Proceedings of the 2008 International Conference on Computational Intelligence and Security - Volume 02
A multi-view approach to semi-supervised document classification with incremental Naive Bayes

Computers & Mathematics with Applications
A novel hybrid ACO-GA algorithm for text feature selection

CEC'09 Proceedings of the Eleventh conference on Congress on Evolutionary Computation
An effective refinement strategy for KNN text classifier

Expert Systems with Applications: An International Journal
Introduction to Semi-Supervised Learning

Introduction to Semi-Supervised Learning
Inductive Inference for Large Scale Text Classification: Kernel Approaches and Techniques

Inductive Inference for Large Scale Text Classification: Kernel Approaches and Techniques
Rough set and ensemble learning based semi-supervised algorithm for text classification

Expert Systems with Applications: An International Journal
Some contributions to semi-supervised learning

Some contributions to semi-supervised learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text categorization is one of the fundamental tasks in text mining. Classical supervised methods need lot of labeled data to train a classifier. Since assigning labels to the large amount of data is very costly and time consuming, it is useful to use data sets without labels. So many different semi-supervised learning methods have been studied recently. Among these semi-supervised methods, self-training is one of the important learning algorithms that classifies unlabeled samples with small amount of labeled ones and adds the most confident samples to the training set. In this paper, dynamic weighting beside majority vote approach is applied to classify the unlabeled data to reliable and unreliable classes. Then, the reliable data are added to the training set and the remaining data including unreliable data are classified in iterative process. We tested this method on the extracted features of ten common Reuter-21578 classes. Experimental result indicates that proposed method improves the classification performance and it's effective.