Classification of Web Documents Using a Naive Bayes Method

  • Authors:
  • Yong Wang;Julia Hodges;Bo Tang

  • Affiliations:
  • -;-;-

  • Venue:
  • ICTAI '03 Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents an automatic document classification system, WebDoc, which classifies Web documents according to the Library of Congress classification scheme. WebDoc constructs a knowledge base from the training data and then classifies the documents based on information in the knowledge base. One of the classification algorithms used in WebDoc is based on Bayes' theorem from probability theory. This paper focuses upon three aspects of this approach: different event models for the naive Bayes method, different probability smoothing methods, and different feature selection methods. In this paper, we report theperformance of each method in terms of recall, precision, and F-measures. Experimental results show that the WebDoc system can classify Web documents effectively and efficiently.