Refinement of TF-IDF schemes for web pages using their hyperlinked neighboring pages

  • Authors:
  • Kazunari Sugiyama;Kenji Hatano;Masatoshi Yoshikawa;Shunsuke Uemura

  • Affiliations:
  • Nara Institute of Science and Technology, Nara, Japan;Nara Institute of Science and Technology, Nara, Japan;Nagoya University, Aichi, Japan;Nara Institute of Science and Technology, Nara, Japan

  • Venue:
  • Proceedings of the fourteenth ACM conference on Hypertext and hypermedia
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

In IR (information retrieval) systems based on the vector space model, the TF-IDF scheme is widely used to characterize documents. However, in the case of documents with hyperlink structures such as Web pages, it is necessary to develop a technique for representing the contents of Web pages more accurately by exploiting the contents of their hyperlinked neighboring pages. In this paper, we first propose several approaches to refining the TF-IDF scheme for a target Web page by using the contents of its hyperlinked neighboring pages, and then compare the retrieval accuracy of our proposed approaches. Experimental results show that, generally, more accurate feature vectors of a target Web page can be generated in the case of utilizing the contents of its hyperlinked neighboring pages at levels up to second in the backward direction from the target page.