A novel framework for web page classification using two-stage neural network

  • Authors:
  • Yunfeng Li;Yukun Cao;Qingsheng Zhu;Zhengyu Zhu

  • Affiliations:
  • Department of Computer Science, Chongqing University, Chongqing, P.R.China;Department of Computer Science, Chongqing University, Chongqing, P.R.China;Department of Computer Science, Chongqing University, Chongqing, P.R.China;Department of Computer Science, Chongqing University, Chongqing, P.R.China

  • Venue:
  • ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Web page classification is one of the essential techniques for Web mining. This paper presents a framework for Web page classification. It is hybrid architecture of neural network PCA (principle components analysis) and SOFM (self-organizing map). In order to perform the classification, a web page is firstly represented by a vector of features with different weights according to the term frequency and the importance of each sentence in the page. As the number of the features is big, PCA is used to select the relevant features. Finally the output of PCA is sent to SOFM for classification. To compare with the proposed framework, two conventional classifiers are used in our experiments: k-NN and Naïve Bayes. Our new method makes a significant improvement in classifications on both data sets compared with the two conventional methods.