A Linguistically Motivated Probabilistic Model of Information Retrieval

Authors:
Djoerd Hiemstra
Affiliations:
-
Venue:
ECDL '98 Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries
Year:
1998

Citing 7
Cited 19

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Text retrieval and inference

Text-based intelligent systems
Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
On modeling information retrieval with probabilistic inference

ACM Transactions on Information Systems (TOIS)
Some inconsistencies and misidentified modeling assumptions in probabilistic information retrieval

ACM Transactions on Information Systems (TOIS)
Pivoted document length normalization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval

Natural Language Processing and Digital Libraries

Information Extraction: Towards Scalable, Adaptable Systems
Disambiguation Strategies for Cross-Language Information Retrieval

ECDL '99 Proceedings of the Third European Conference on Research and Advanced Technology for Digital Libraries
Models in Information Retrieval

ESSIR '00 Proceedings of the Third European Summer-School on Lectures on Information Retrieval-Revised Lectures
Entity Ranking from Annotated Text Collections Using Multitype Topic Models

Focused Access to XML Documents
Structured Document Retrieval, Multimedia Retrieval, and Entity Ranking Using PF/Tijah

Focused Access to XML Documents
Conceptual language models for domain-specific retrieval

Information Processing and Management: an International Journal
Searching cultural heritage data: does structure help expert searchers?

RIAO '10 Adaptivity, Personalization and Fusion of Heterogeneous Information
Enriching document representation via translation for improved monolingual information retrieval

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Reliability and effectiveness of clickthrough data for automatic image annotation

Multimedia Tools and Applications
Using structural relationships for focused XML retrieval

FQAS'06 Proceedings of the 7th international conference on Flexible Query Answering Systems
Surface features in video retrieval

AMR'05 Proceedings of the Third international conference on Adaptive Multimedia Retrieval: user, context, and feedback
A declarative DB-Powered approach to IR

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
A language model which integrates uncertainty

FDIA'07 Proceedings of the 1st BCS IRSG conference on Future Directions in Information Access
BibRank: a language-based model for co-ranking entities in bibliographic networks

Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Robust recommendations using regularized link analysis of browsing behavior graphs

SBP'12 Proceedings of the 5th international conference on Social Computing, Behavioral-Cultural Modeling and Prediction
Comparison of information retrieval models for question answering

Proceedings of the Fifth Balkan Conference in Informatics
Thesaurus-based feedback to support mixed search and browsing environments

ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
Probabilistic co-relevance for query-sensitive similarity measurement in information retrieval

Information Processing and Management: an International Journal
Optimizing ranking method using social annotations based on language model

Artificial Intelligence Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a new probabilistic model of information retrieval. The most important modeling assumption made is that documents and queries are defined by an ordered sequence of single terms. This assumption is not made in well known existing models of information retrieval, but is essential in the field of statistical natural language processing. Advances already made in statistical natural language processing will be used in this paper to formulate a probabilistic justification for using tf×idf term weighting. The paper shows that the new probabilistic interpretation of tf×idf term weighting might lead to better understanding of statistical ranking mechanisms, for example by explaining how they relate to coordination level ranking. A pilot experiment on the Cranfield test collection indicates that the presented model outperforms the vector space model with classical tf×idf and cosine length normalisation.