Reducing human interactions in Web directory searches

  • Authors:
  • Ori Gerstel;Shay Kutten;Eduardo Sany Laber;Rachel Matichin;David Peleg;Artur Alves Pessoa;Criston Souza

  • Affiliations:
  • Cisco, San Jose, CA;Technion, Haifa, Israel;PUC-Rio, Rio de Janeiro, Brazil;The Weizmann Institute of Science, Rehovot, Israel;The Weizmann Institute of Science, Rehovot, Israel;UFF, Niteri—RJ, Brazil;PUC-Rio, Rio de Janeiro, Brazil

  • Venue:
  • ACM Transactions on Information Systems (TOIS)
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Consider a website containing a collection of webpages with data such as in Yahoo or the Open Directory project. Each page is associated with a weight representing the frequency with which that page is accessed by users. In the tree hierarchy representation, accessing each page requires the user to travel along the path leading to it from the root. By enhancing the index tree with additional edges (hotlinks) one may reduce the access cost of the system. In other words, the hotlinks reduce the expected number of steps needed to reach a leaf page from the tree root, assuming that the user knows which hotlinks to take. The hotlink enhancement problem involves finding a set of hotlinks minimizing this cost. This article proposes the first exact algorithm for the hotlink enhancement problem. This algorithm runs in polynomial time for trees with logarithmic depth. Experiments conducted with real data show that significant improvement in the expected number of accesses per search can be achieved in websites using this algorithm. These experiments also suggest that the simple and much faster heuristic proposed previously by Czyzowicz et al. [2003] creates hotlinks that are nearly optimal in the time savings they provide to the user. The version of the hotlink enhancement problem in which the weight distribution on the leaves is unknown is discussed as well. We present a polynomial-time algorithm that is optimal for any tree for any depth.