Statistical models for unformatted text

  • Authors:
  • Christopher Landauer

  • Affiliations:
  • System Development Corporation, Santa Monica, California

  • Venue:
  • SIGIR '81 Proceedings of the 4th annual international ACM SIGIR conference on Information storage and retrieval: theoretical issues in information retrieval
  • Year:
  • 1981

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this note, we will describe some of the outstanding problems concerning statistical information retrieval models, and the underlying stochastic language production models they assume. The problems can be separated into classes according to the underlying language model, which can be either a sequence model or a grammar model. Both kinds of model are based on a stochastic process, but there is a different filter for the realization. The grammar models use a stochastic context sensitive grammar, and the sequence models use a high order Markov chain.Most of these problems cannot be solved without experimentation with information retrieval concepts and systems. Most information retrieval systems that currently exist have had to make operational assumptions about the answers to these questions. It is expected that more precise knowledge of solutions for these problems will simplify the design and improve the effectiveness of statistical information retrieval systems.