Reasoning for web document associations and its applications in site map construction

  • Authors:
  • K. Selçuk Candan;Wen-Syan Li

  • Affiliations:
  • C & C Research Laboratories - Silicon Vally, NEC USA Inc., 10080 North Wolfe Road, Suite SW3-350, Cupertino, CA;C & C Research Laboratories - Silicon Vally, NEC USA Inc., 10080 North Wolfe Road, Suite SW3-350, Cupertino, CA

  • Venue:
  • Data & Knowledge Engineering
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recently, there is an interest in using associations between Web pages in providing users with pages relevant to what they are currently viewing. We believe that, to enable intelligent decisions, we need to answer the question "for a given set of pages, find out why they are associated". We present a framework for reasoning about Web document associations. We start from the observation that the reasons of the Web page associations are implicitly embedded in the content of the pages as well as the links connecting them. The association reasoning scheme we propose is based on a random walk algorithm. This algorithm can take both link structure and contents into consideration and allows users to specify a focus. We then show how the proposed algorithm, combined with a logical domain identification technique, can be used for Web site summarization and Web site map construction to help users navigate through complex corporate sites. We see that, to achieve this goal, it is essential to recover the Web authors' intentions and superimpose it with the users' retrieval contexts in summarizing Web sites. Therefore, we present a framework, which uses logical neighborhoods, entry pages, and associations of entry pages, in creating context-sensitive summaries and maps of complex Web sites.