Extracting a website's content structure from its link structure

  • Authors:
  • Nan Liu;Christopher C. Yang

  • Affiliations:
  • The Chinese University of Hong Kong;The Chinese University of Hong Kong

  • Venue:
  • Proceedings of the 14th ACM international conference on Information and knowledge management
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Hierarchical models are commonly used to organize a Website's content. A Website's content structure can be represented by a topic hierarchy, a directed tree rooted at a Website's homepage in which the vertices and edges correspond to Web pages and hyperlinks. In this work, we propose an algorithm for extracting a Website's topic hierarchy from its link structure. The proposed algorithm consists of a construction stage and a refining stage, in which we analyze the semantic relationships between web pages based on link structure, web page content and directory structure. We've done extensive experiments using different Websites and obtained very promising results.