The document as an ergodic markov chain

Authors:
Eduard Hoenkamp;Dawei Song
Affiliations:
Nijmegen Institute for Cognition and Information, The Netherlands;-
Venue:
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2004

Citing 2
Cited 3

Document language models, query models, and risk minimization for information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Unitary operators on the document space

Journal of the American Society for Information Science and Technology - Mathematical, logical, and formal methods in information retrieval

Trading spaces: on the lore and limitations of latent semantic analysis

ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
A fingerprinting technique for evaluating semantics based indexing

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Onomatology and content analysis of ergodic literature

Proceedings of the 3rd Narrative and Hypertext Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years, statistical language models are being proposed as alternative to the vector space model. Viewing documents as language samples introduces the issue of defining a joint probability distribution over the terms.The present paper models a document as the result of a Markov process. It argues that this process is ergodic, which is theoretically plausible, and easy to verify in practice.The theoretical result is that the joint distribution can be easily obtained. This can also be applied for search resolutions other than the document level. We verified this in an experiment on query expansion demonstrating both the validity and the practicability of the method. This holds a promise for general language models.