Importance-based web page classification using cost-sensitive SVM

Authors:
Wei Liu;Gui-rong Xue;Yong Yu;Hua-jun Zeng
Affiliations:
Shanghai Jiao Tong University, Min Hang Shanghai, China;Shanghai Jiao Tong University, Min Hang Shanghai, China;Computer Science Department, Shanghai Jiao Tong University, Shanghai, China;Microsoft Research Asia, Beijing, China
Venue:
WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
Year:
2005

Citing 14
Cited 1

Transformation and weighting in regression

Transformation and weighting in regression
A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Enhanced hypertext categorization using hyperlinks

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Making large-scale support vector machine learning practical

Advances in kernel methods
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Least Squares Support Vector Machine Classifiers

Neural Processing Letters
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Hierarchical Text Categorization Using Neural Networks

Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Building Text Classifiers Using Positive and Unlabeled Examples

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Using urls and table layout for web classification tasks

Proceedings of the 13th international conference on World Wide Web

Web page classification: Features and algorithms

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Web page classification is facing great challenges since there is a huge repository and diversity of information. As known, each web page varies both in content and quality, just as PageRank suggested. Typical machine learning algorithms take advantage of positive and negative examples to train a classifier; however, it has been neglected that each instance has a different weight, which can be user pre-defined. This paper presents an effective algorithm based on Cost-Sensitive Support Vector Machine (CS-SVM) to improve the accuracy of classification. During the training process of CS-SVM, different cost factors are attached on the training errors to generate an optimized hyperplane. Our experiments show that CS-SVM outperforms SVM on the standard ODP data set. The web pages with relative high PageRank values contribute most to the classifier and using them for training can exceed the random sampling technique.