Heterogeneous Learner for Web Page Classification

  • Authors:
  • Hwanjo Yu;Kevin Chen-Chuan Chang;Jiawei Han

  • Affiliations:
  • -;-;-

  • Venue:
  • ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Classification of an interesting class of Web pages (e.g.,personal homepages, resume pages) has been an interestingproblem. Typical machine learning algorithms for thisproblem require two classes of data for training: positiveand negative training examples. However, in applicationto Web page classification, gathering an unbiased sampleof negative examples appears to be difficult. We proposea heterogeneous learning framework for classifying Webpages, which (1) eliminates the need for negative trainingdata, and (2) increases classification accuracy by using twoheterogeneous learners. Our framework uses two heterogeneouslearners - a decision list and a linear separatorwhich complement each other - to eliminate the need fornegative training data in the training phase and to increasethe accuracy in the testing phase. Our results show that ourheterogeneous framework achieves high accuracy withoutrequiring negative training data; it enhances the accuracyof linear separators by reducing the errors on "low-margindata". That is, it classifies more accurately while requiringless human efforts in training.