Probabilistic retrieval based on staged logistic regression

  • Authors:
  • William S. Cooper;Fredric C. Gey;Daniel P. Dabney

  • Affiliations:
  • S.L.I.S., University of California, Berkeley;S.L.I.S., University of California, Berkeley;G.S.L.I.S., University of California, Los Angeles

  • Venue:
  • SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 1992

Quantified Score

Hi-index 0.00

Visualization

Abstract

The goal of a probabilistic retrieval system design is to rank the elements of the search universe in descending order of their estimated probability of usefulness to the user. Previously explored methods for computing such a ranking have involved the use of statistical independence assumptions and multiple regression analysis on a learning sample. In this paper these techniques are recombined in a new way to achieve greater accuracy of probabilistic estimate without undue additional computational complexity. The novel element of the proposed design is that the regression analysis be carried out in two or more levels or stages. Such an approach allows composite or grouped retrieval clues to be analyzed in an orderly manner -- first within groups, and then between. It compensates automatically for systematic biases introduced by the statistical simplifying assumptions, and gives rise to search algorithms of reasonable computational efficiency.