Discovering conceptual page hierarchy of a web site from user traversal history

  • Authors:
  • Xia Chen;Minqiang Li;Wei Zhao;Ding-Yi Chen

  • Affiliations:
  • School of Electronic and Information Engineering, Tianjin University, Tianjin, P.R. China;School of Management, Tianjin University, Tianjin, P.R. China;School of Electronic and Information Engineering, Tianjin University, Tianjin, P.R. China;School of Information Technology and Electrical Engineering, University of Queensland, QLD, Australia

  • Venue:
  • ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

A Web site generally contains a wide range of topics which provide information for users who have different access interests and goals. This information is not randomly scattered, but well organized under a hierarchy encoded in the hyperlink structure of a Web site. It is intended to mold the user's mental models of how the information is organized. On the other hand, user traversals over hyperlinks between Web pages can reveal semantic relationships between these pages. Unfortunately, the link structure of a Web site which represent the Web designer's expectation on visitors may be quite different from the organization expected by visitors to this site. Discovering the conceptual page hierarchy from a user's angle can help web masters to have an sight into real relationships among the Web pages and refine the link structure of the Web site to facilitate effective user navigation. In this paper, we propose a method to generate a conceptual page hierarchy of a Web site on the basis of user traversal history. We use maximal forward references to model user's traversal behavior over the underlying link hierarchy of a Web site. We then build a weighted directed graph to represent the inter-relationships between Web pages. Finally we apply a “Maximum Spanning Tree” (MST) algorithm to generate a conceptual page hierarchy of the Web site. We demonstrate the effectiveness of our approach by conducting a preliminary experiment based on a real world Web data.