Integrating External Knowledge to Supplement Training Data in Semi-Supervised Learning for Text Categorization

Authors:
Mohammed Benkhalifa;Abdelhak Mouradi;Houssaine Bouyakhf
Affiliations:
School of Science and Engineering, Al Akhawayn University in Ifrane (AUI), Av. Hassan II, Ifrane 53000, Morocco. M.Benkhalifa@AlAkhawayn.ma;Ecole Nationale Superieure d'Informatique et d'Analyses des Systémes (ENSIAS), Mohammed V University, Agdal Rabat, Morocco. mouradi@ensias.um5souissi.ac.ma;Computer Science Department, Mohammed V University, Facuty of Sciences in Rabat, Morocco. bouyakhf@fsr.ac.ma
Venue:
Information Retrieval
Year:
2001

Citing 22
Cited 6

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
The automatic indexing system AIR/PHYS - from research to applications

SIGIR '88 Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval: data structures and algorithms

Information retrieval: data structures and algorithms
Trading MIPS and memory for knowledge engineering

Communications of the ACM
Representation and learning in information retrieval

Representation and learning in information retrieval
Automatic indexing based on Bayesian inference networks

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Automated learning of decision rules for text categorization

ACM Transactions on Information Systems (TOIS)
An example-based mapping method for text categorization and retrieval

ACM Transactions on Information Systems (TOIS)
Query expansion using lexical-semantic relations

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
WordNet: a lexical database for English

Communications of the ACM
Combining classifiers in text categorization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchical neural networks for text categorization (poster abstract)

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Pattern Recognition with Fuzzy Objective Function Algorithms

Pattern Recognition with Fuzzy Objective Function Algorithms
Information Retrieval

Information Retrieval
CONSTRUE/TIS: A System for Content-Based Indexing of a Database of News Stories

IAAI '90 Proceedings of the The Second Conference on Innovative Applications of Artificial Intelligence
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature Engineering for Text Classification

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Partially supervised clustering for image segmentation

Pattern Recognition
Multiple-prototype classifier design

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews

A Comparison of Word- and Sense-Based Text Categorization Using Several Classification Algorithms

Journal of Intelligent Information Systems
Evolutionary semi-supervised fuzzy clustering

Pattern Recognition Letters
An analysis of the relative hardness of Reuters-21578 subsets: Research Articles

Journal of the American Society for Information Science and Technology
A class-feature-centroid classifier for text categorization

Proceedings of the 18th international conference on World wide web
SCTWC: An online semi-supervised clustering approach to topical web crawlers

Applied Soft Computing
The role of word sense disambiguation in automated text categorization

NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

Text Categorization (TC) is the automated assignment of text documents to predefined categories based on document contents. TC has been an application for many learning approaches, which prove effective. Nevertheless, TC provides many challenges to machine learning. In this paper, we suggest, for text categorization, the integration of external WordNet lexical information to supplement training data for a semi-supervised clustering algorithm which can learn from both training and test documents to classify new unseen documents. This algorithm is the “Semi-Supervised Fuzzy c-Means” (ssFCM). Our experiments use Reuters 21578 database and consist of binary classifications for categories selected from the 115 TOPICS classes of the Reuters collection. Using the Vector Space Model, each document is represented by its original feature vector augmented with external feature vector generated using WordNet. We verify experimentally that the integration of WordNet helps ssFCM improve its performance, effectively addresses the classification of documents into categories with few training documents and does not interfere with the use of training data.