A Probabilistic Approach for Distillation and Ranking of Web Pages

  • Authors:
  • Gianluigi Greco;Sergio Greco;Ester Zumpano

  • Affiliations:
  • DEIS, Università della Calabria, 87030 Rende, Italy ggreco@si.deis.unical.it;DEIS, Università della Calabria, 87030 Rende, Italy greco@si.deis.unical.it;DEIS, Università della Calabria, 87030 Rende, Italy zumpano@si.deis.unical.it

  • Venue:
  • World Wide Web
  • Year:
  • 2002

Quantified Score

Hi-index 0.02

Visualization

Abstract

A great number of recent papers have investigated the possibility of introducing more effective and efficient algorithms for search engines. In traditional search engines the resulting ranking is carried out using textual information only and, as showed by several works, they are not very useful for extracting relevant information. Present research, instead, takes a new approach, called Topic Distillation, whose main task is finding relevant documents using a different similarity criterion: retrieved documents are those related to the query topic, but which do not necessarily contain the query string. Current algorithms for topic distillation first compute a base set containing all the relevant pages and then, by applying an iterative procedure, obtain the authoritative pages. In this paper, we present a different approach which computes the authoritative pages by analyzing the structure of the base set. The technique applies a statistical approach to the co-citation matrix (of the base set) to find the most co-cited pages and combines a link analysis approach with the content page evaluation. Several experiments have shown the validity of our approach.