Heterogeneous Learner for Web Page Classification

Authors:
Hwanjo Yu;Kevin Chen-Chuan Chang;Jiawei Han
Affiliations:
-;-;-
Venue:
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Year:
2002

Citing 0
Cited 10

Goal-oriented methods and meta methods for document classification and their parameter tuning

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Text Classification without Labeled Negative Documents

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Localization Site Prediction for Membrane Proteins by Integrating Rule and SVM Classification

IEEE Transactions on Knowledge and Data Engineering
Text Classification without Negative Examples Revisit

IEEE Transactions on Knowledge and Data Engineering
Privacy-preserving SVM using nonlinear kernels on horizontally partitioned data

Proceedings of the 2006 ACM symposium on Applied computing
Meta methods for model sharing in personal information systems

ACM Transactions on Information Systems (TOIS)
Solving problems two at a time: classification of web pages using a generic pair-wise multiple classifier system

MCS'03 Proceedings of the 4th international conference on Multiple classifier systems
Support vector machine approach for fast classification

DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
Using restrictive classification and meta classification for junk elimination

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Automatic document organization in a p2p environment

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Classification of an interesting class of Web pages (e.g.,personal homepages, resume pages) has been an interestingproblem. Typical machine learning algorithms for thisproblem require two classes of data for training: positiveand negative training examples. However, in applicationto Web page classification, gathering an unbiased sampleof negative examples appears to be difficult. We proposea heterogeneous learning framework for classifying Webpages, which (1) eliminates the need for negative trainingdata, and (2) increases classification accuracy by using twoheterogeneous learners. Our framework uses two heterogeneouslearners - a decision list and a linear separatorwhich complement each other - to eliminate the need fornegative training data in the training phase and to increasethe accuracy in the testing phase. Our results show that ourheterogeneous framework achieves high accuracy withoutrequiring negative training data; it enhances the accuracyof linear separators by reducing the errors on "low-margindata". That is, it classifies more accurately while requiringless human efforts in training.