Documents and queries as random variables: History and implications: Research Articles

  • Authors:
  • David Bodoff;Samuel Po-Shing Wong

  • Affiliations:
  • Graduate School of Business, University of Haifa, Haifa, Israel;Department of Statistics, The Chinese University of Hong Kong, Hong Kong

  • Venue:
  • Journal of the American Society for Information Science and Technology
  • Year:
  • 2006

Quantified Score

Hi-index 0.01

Visualization

Abstract

The view of documents and/or queries as random variables is gaining importance in the theory of information retrieval. We argue that traditional probabilistic models consider documents and queries as random variables, but that newer models such as language modeling and our unified model take this one step further. The additional step is called error in predictors. Such models consider that we don't observe the document and query random variables that are modeled to predict relevance probabilistically. Rather, there are additional random variables, which are the observed documents and queries. We discuss some important implications of this idea for parameter estimation, relevance prediction, and even test-collection construction. By clarifying the positions of various probabilistic models on this question, and presenting in one place many of its implications, this article aims to deepen our common understanding of the theories behind traditional probabilistic models, and to strengthen the theoretical basis for further development of more recent approaches such as language modeling. © 2006 Wiley Periodicals, Inc.