Importance-based web page classification using cost-sensitive SVM

  • Authors:
  • Wei Liu;Gui-rong Xue;Yong Yu;Hua-jun Zeng

  • Affiliations:
  • Shanghai Jiao Tong University, Min Hang Shanghai, China;Shanghai Jiao Tong University, Min Hang Shanghai, China;Computer Science Department, Shanghai Jiao Tong University, Shanghai, China;Microsoft Research Asia, Beijing, China

  • Venue:
  • WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Web page classification is facing great challenges since there is a huge repository and diversity of information. As known, each web page varies both in content and quality, just as PageRank suggested. Typical machine learning algorithms take advantage of positive and negative examples to train a classifier; however, it has been neglected that each instance has a different weight, which can be user pre-defined. This paper presents an effective algorithm based on Cost-Sensitive Support Vector Machine (CS-SVM) to improve the accuracy of classification. During the training process of CS-SVM, different cost factors are attached on the training errors to generate an optimized hyperplane. Our experiments show that CS-SVM outperforms SVM on the standard ODP data set. The web pages with relative high PageRank values contribute most to the classifier and using them for training can exceed the random sampling technique.