Improvement of web data clustering using web page contents

  • Authors:
  • Yue Xu;Li-Tung Weng

  • Affiliations:
  • School of Software Engineering and Data Communications, Queensland University of Technology, Brisbane, Australia;School of Software Engineering and Data Communications, Queensland University of Technology, Brisbane, Australia

  • Venue:
  • Intelligent information processing II
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents an approach that discovers clusters of Web pages based on Web log data and Web page contents as well. Most existing Web log mining techniques are access-based approaches that statistically analyze the log data without paying much attention on the contents of the pages. The log data contains various kinds of noise which can significantly influence the performance of pure access-based web log mining. The method proposed in this paper not only considers the frequence of page co-occurrence in user access logs, but also takes into account the web page contents to cluster Web pages. We also present a method of using information entropy to prune away irrelevant papges which improves the performance of the web page clustering.