Statistical models for unformatted text

Authors:
Christopher Landauer
Affiliations:
System Development Corporation, Santa Monica, California
Venue:
SIGIR '81 Proceedings of the 4th annual international ACM SIGIR conference on Information storage and retrieval: theoretical issues in information retrieval
Year:
1981

Citing 2
Cited 0

Message extraction through estimated relevance

SIGIR '79 Proceedings of the 2nd annual international ACM SIGIR conference on Information storage and retrieval: information implications into the eighties
A performance evaluation of similarity measures, document term weighting schemes and representations in a Boolean environment

SIGIR '80 Proceedings of the 3rd annual ACM conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this note, we will describe some of the outstanding problems concerning statistical information retrieval models, and the underlying stochastic language production models they assume. The problems can be separated into classes according to the underlying language model, which can be either a sequence model or a grammar model. Both kinds of model are based on a stochastic process, but there is a different filter for the realization. The grammar models use a stochastic context sensitive grammar, and the sequence models use a high order Markov chain.Most of these problems cannot be solved without experimentation with information retrieval concepts and systems. Most information retrieval systems that currently exist have had to make operational assumptions about the answers to these questions. It is expected that more precise knowledge of solutions for these problems will simplify the design and improve the effectiveness of statistical information retrieval systems.