A probabilistic approach to navigation in Hypertext

  • Authors:
  • Mark Levene;George Loizou

  • Affiliations:
  • University College London, Gower Street, London WC1E 6BT, UK;Birkbeck College, Department of Computer Science, Malet Street, London WC1E 7HX, UK

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 1999

Quantified Score

Hi-index 0.07

Visualization

Abstract

One of the main unsolved problems confronting Hypertext is the navigation problem, namely the problem of having to know where you are in the database graph representing the structure of a Hypertext database, and knowing how to get to some other place you are searching for in the database graph. Previously we formalised a Hypertext database in terms of a directed graph whose nodes represent pages of information. The notion of a trail, which is a path in the database graph describing some logical association amongst the pages in the trail, is central to our model. We defined a Hypertext Query Language, HQL, over Hypertext databases and showed that in general the navigation problem, i.e. the problem of finding a trail that satisfies a HQL query (technically known as the model checking problem), is NP-complete. Herein we present a preliminary investigation of using a probabilistic approach in order to enhance the efficiency of model checking. The flavour of our investigation is that if we have some additional statistical information about the Hypertext database then we can utilise such information during query processing. We present two different approaches. The first approach utilises the theory of probabilistic automata. In this approach we view a Hypertext database as a probabilistic automaton, which we call a Hypertext probabilistic automaton. In such an automaton we assume that the probability of traversing a link is determined by the usage statistics of that link. We exhibit a special case when the number of trails that satisfy a query is always finite and indicate how to give a finite approximation of answering a query in the general case. The second approach utilises the theory of random Turing machines. In this approach we view a Hypertext database as a probabilistic algorithm, realised via a Hypertext random automaton. In such an automaton we assume that out of a choice of links, traversing any one of them is equally likely. We obtain the lower bound of the probability that a random trail satisfies a query. In principle, by iterating this probabilistic algorithm, associated with the Hypertext database, the probability of finding a trail that satisfies the query can be made arbitrarily large.