Using unlabeled data to improve classification in the naive bayes approach: Application to web searches

Authors:
Stella M. Salvatierra
Affiliations:
Facultad de Ciencias Economicas y Empresariales, Universidad de Navarra, Edificio Bibliotecas (Entrada Este), 31080 Pamplona, Spain. Tel.: +34 6150 588 06/ Fax: +34 9428 647 36/ E-mail: ssalvat@un ...
Venue:
Journal of Computational Methods in Sciences and Engineering - Computational and Mathematical Methods for Science and Engineering Conference 2002 - CMMSE-2002
Year:
2004

Citing 1
Cited 0

Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces a method to build a classifier based on labelled and unlabelled data. We set up the Expectation-Maximization (EM) algorithm steps for the particular case of the naive Bayes approach and show empirical work for the restricted web page database. Original contributions includes the application of the EM algorithm to simulated data in order to see the behavior of the algorithm for different numbers of labelled and unlabelled data, and to study the effect of the sampling mechanism for the unlabelled data on the results.