Automatic Web Page Classification Using Various Features

  • Authors:
  • Hao Wen;Liping Fang;Ling Guan

  • Affiliations:
  • Department of Mechanical and Industrial Engineering, Ryerson University, Canada;Department of Mechanical and Industrial Engineering, Ryerson University, Canada;Department of Electrical and Computer Engineering, Ryerson University, Canada

  • Venue:
  • PCM '08 Proceedings of the 9th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
  • Year:
  • 2008

Quantified Score

Hi-index 0.01

Visualization

Abstract

A model of automatically classifying uncertain Web pages using multiple features is presented. Since the traditional tree structure can barely classify an avalanche of new Web pages, the proposed approach partially uses the idea of "bag of words" incorporating the idea of classification fusion to describe and categorize Web pages. The proposed approach extracts features of Web pages from various perspectives, such as consulting a Web directory service, analyzing the text features of Web pages' titles and meta-search keywords, and identifying primary content of Web pages. Through fusing the results from these three dedicated classifiers, Web pages are classified to one or more categories with a bunch of words representing the Web pages. In order to demonstrate the effectiveness of the proposed method, experiments are carried out. In the experiments, the Web pages are classified using the proposed fusion method to four categories. A comparison between the dedicated classifiers and fusion methods is also presented.