Identifying a hierarchy of bipartite subgraphs for web site abstraction

  • Authors:
  • William K. Cheung;Yuxiang Sun

  • Affiliations:
  • (Correspd. E-mail: william@comp.hkbu.edu.hk) Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong;Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong

  • Venue:
  • Web Intelligence and Agent Systems
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Web is transforming from a merely information dissemination platform towards a distributed knowledge-based platform for supporting complex problem solving. However, the existing Web contains a large amount of knowledge which is only tagged using layout related markups, making them hard to be discovered and used. In this paper, we purpose to model semantic-rich and self-contained knowledge units embedded in a web site as a mixture of bipartite sub-graphs and to extract the subgraphs as the web site abstraction via hyperlink structure and file hierarchy analysis. A recursive algorithm, named ReHITS, is derived which can identify bipartite sub-graphs with a hierarchical organization. Each identified sub-graph contains a set of associated authorities and hubs as its summarized semantic description. The effectiveness of the algorithm has been evaluated using three real web sites (containing ∼ 10000 web pages) with promising results. Detailed interpretation of the experimental results and qualitative comparison with other related work are also included.