Web Page Classification Based on a Least Square Support Vector Machine with Latent Semantic Analysis

  • Authors:
  • Yong Zhang;Bin Fan;Long-bin Xiao

  • Affiliations:
  • -;-;-

  • Venue:
  • FSKD '08 Proceedings of the 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 02
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Chinese web page classification (WPC) has been considered as a hot research area in data mining. In order to effectively classify web pages, we present a web page categorization based on a least square support vector machine (LS-SVM) with latent semantic analysis (LSA). LSA uses Singular Value Decom- postion (SVD) to obtain latent semantic structure of original term-document matrix solving the polysemous and synonymous keywords problem. LS-SVM is an effective method for learning the classification knowledge from massive data, especially on condition of high cost in getting labeled classical examples. We adopt a novel method of web page expression, and make use of summarization algorithm to reduce the noise of web pages. A preliminary experimental comparison is made showing encouraging results.