Empirical development of an exponential probabilistic model for text retrieval: using textual analysis to build a better model

Authors:
Jaime Teevan;David R. Karger
Affiliations:
MIT AI Lab, Cambridge, MA;MIT LCS, Cambridge, MA
Venue:
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Year:
2003

Citing 14
Cited 9

Probabilistic models in information retrieval

The Computer Journal - Special issue on information retrieval
Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic feedback using past queries: social searching?

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
A theory of term weighting based on exploratory data analysis

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Improving two-stage ad-hoc retrieval for short queries

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Learning to classify text from labeled and unlabeled documents

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Towards multidocument summarization by reformulation: progress and prospects

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Information Retrieval

Information Retrieval
Title language model for information retrieval

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Two-stage language models for information retrieval

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A New Probabilistic Model of Text Classification and Retrieval TITLE2:

A New Probabilistic Model of Text Classification and Retrieval TITLE2:
Distribution of content words and phrases in text and language modelling

Natural Language Engineering

Discriminative models for information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance models to help estimate document and query parameters

ACM Transactions on Information Systems (TOIS)
Modeling word burstiness using the Dirichlet distribution

ICML '05 Proceedings of the 22nd international conference on Machine learning
Less is more: probabilistic models for retrieving fewer relevant documents

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Unsupervised learning of a finite discrete mixture: Applications to texture modeling and image databases summarization

Journal of Visual Communication and Image Representation
A study of Poisson query generation model for information retrieval

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Determining termhood for learning domain ontologies using domain prevalence and tendency

AusDM '07 Proceedings of the sixth Australasian conference on Data mining and analytics - Volume 70
The ineffectiveness of within-document term frequency in text classification

Information Retrieval
Modeling the evolution of context in information retrieval

FDIA'08 Proceedings of the 2nd BCS IRSG conference on Future Directions in Information Access

Quantified Score

Hi-index	0.00

Visualization

Abstract

Much work in information retrieval focuses on using a model of documents and queries to derive retrieval algorithms. Model based development is a useful alternative to heuristic development because in a model the assumptions are explicit and can be examined and refined independent of the particular retrieval algorithm. We explore the explicit assumptions underlying the naïve framework by performing computational analysis of actual corpora and queries to devise a generative document model that closely matches text. Our thesis is that a model so developed will be more accurate than existing models, and thus more useful in retrieval, as well as other applications. We test this by learning from a corpus the best document model. We find the learned model better predicts the existence of text data and has improved performance on certain IR tasks.