Graph-based Semi-supervised Learning Algorithm for Web Page Classification

  • Authors:
  • Rong Liu;Jianzhong Zhou;Ming Liu

  • Affiliations:
  • Huazhong University of Science and Technology, China/ Central China Normal University, China;Huazhong University of Science and Technology, China;Huazhong University of Science and Technology, China/ Central China Normal University, China

  • Venue:
  • ISDA '06 Proceedings of the Sixth International Conference on Intelligent Systems Design and Applications - Volume 02
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many application domains such as web page classification suffer from not having enough labeled training examples for learning. However, unlabeled training examples are readily available but labeled ones are fairly expensive to obtain. As a result, there has been a great deal of work in resent years on semi-supervised learning. This paper proposes a graph-based semi-supervised learning algorithm that is applied to the web page classification. Our algorithm uses a similarity measure between web pages to construct a K-Nearest Neighbor graph. Labeled and unlabeled web pages are represented as nodes in the weighted graph, with edge weights encoding the similarity between the web pages. In order to use unlabeled data to help classification and get higher accuracy, edge weights of the graph are computed through combining weighting schemes and link information of web pages. The learning problem is then formulated in terms of label propagation in the graph. By using probabilistic matrix methods and belief propagation, the labeled nodes push out labels through unlabeled nodes. Our preliminary experiments on the WebKB dataset show that the algorithm in this paper can effectively exploit unlabeled data in addition to labeled ones to get higher accuracy of web page classification.