Two-stage language models for information retrieval

Authors:
ChengXiang Zhai;John Lafferty
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2002

Citing 11
Cited 67

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Pivoted document length normalization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Improving two-stage ad-hoc retrieval for short queries

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A hidden Markov model information retrieval system

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval as statistical translation

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A vector space model for automatic indexing

Communications of the ACM
Document language models, query models, and risk minimization for information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
On the Estimation of 'Small' Probabilities by Leaving-One-Out

IEEE Transactions on Pattern Analysis and Machine Intelligence

Database research at the University of Illinois at Urbana-Champaign

ACM SIGMOD Record
Bayesian extension to the language model for ad hoc information retrieval

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Empirical development of an exponential probabilistic model for text retrieval: using textual analysis to build a better model

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Combining document representations for known-item search

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Retrieval and novelty detection at the sentence level

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Embedding web-based statistical translation models in cross-language information retrieval

Computational Linguistics - Special issue on web as corpus
Dependence language model for information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Cluster-based retrieval using language models

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
User biased document language modelling

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A multi-system analysis of document and term selection for blind feedback

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Introduction to the special issue on statistical language modeling

ACM Transactions on Asian Language Information Processing (TALIP)
Combining the language model and inference network approaches to retrieval

Information Processing and Management: an International Journal - Special issue: Bayesian networks and information retrieval
Linear discriminant model for information retrieval

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Integrating word relationships into language models

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Mining comparable bilingual text corpora for cross-language information integration

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Learn to weight terms in information retrieval using category information

ICML '05 Proceedings of the 22nd international conference on Machine learning
A risk minimization framework for information retrieval

Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
Context-sensitive semantic smoothing for the language modeling approach to genomic IR

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Lightening the load of document smoothing for better language modeling retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Estimation, sensitivity, and generalization in parameterized retrieval models

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
An empirical investigation of user term feedback in text-based targeted image search

ACM Transactions on Information Systems (TOIS)
An empirical study of query expansion and cluster-based retrieval in language modeling approach

Information Processing and Management: an International Journal - Special issue: AIRS2005: Information retrieval research in Asia
Inferential language models for information retrieval

ACM Transactions on Asian Language Information Processing (TALIP)
Searching with style: authorship attribution in classic literature

ACSC '07 Proceedings of the thirtieth Australasian conference on Computer science - Volume 62
An exploration of proximity measures in information retrieval

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A study of Poisson query generation model for information retrieval

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Using gradient descent to optimize language modeling smoothing parameters

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Parameter sensitivity in the probabilistic model for ad-hoc retrieval

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Classification-aware hidden-web text database selection

ACM Transactions on Information Systems (TOIS)
DirichletRank: Solving the zero-one gap problem of PageRank

ACM Transactions on Information Systems (TOIS)
An analysis on document length retrieval trends in language modeling smoothing

Information Retrieval
Smoothing document language models with probabilistic term count propagation

Information Retrieval
Assessing multivariate Bernoulli models for information retrieval

ACM Transactions on Information Systems (TOIS)
Personalized interactive faceted search

Proceedings of the 17th international conference on World Wide Web
A new probabilistic retrieval model based on the dirichlet compound multinomial distribution

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A Novel Language Model Based on Cognition Attention Attenuation in Web Retrieval

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Effective latent space graph-based re-ranking model with global consistency

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Statistical Language Models for Information Retrieval A Critical Review

Foundations and Trends in Information Retrieval
Clusters, language models, and ad hoc information retrieval

ACM Transactions on Information Systems (TOIS)
BVideoQA: Online English-Chinese bilingual video question answering

Journal of the American Society for Information Science and Technology
A generalized Co-HITS algorithm and its application to bipartite graphs

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Placing flickr photos on a map

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Relevance feedback models for recommendation

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Semantic smoothing of document models for agglomerative clustering

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Context-sensitive semantic smoothing using semantically relatable sequences

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
A risk minimization framework for information retrieval

Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
Improving probabilistic information retrieval by modeling burstiness of words

Information Processing and Management: an International Journal
Optimizing two stage bigram language models for IR

Proceedings of the 19th international conference on World wide web
Utilizing passage-based language models for ad hoc document retrieval

Information Retrieval
Adapting boosting for information retrieval measures

Information Retrieval
Multi-style language model for web scale information retrieval

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
PageRank without hyperlinks: Structural reranking using links induced by language models

ACM Transactions on Information Systems (TOIS)
RALI: Automatic weighting of text window distances

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
Towards an optimal weighting of context words based on distance

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Exploiting semantic tags in XML retrieval

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Exploiting thread structures to improve smoothing of language models for forum post retrieval

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Extending the language modeling framework for sentence retrieval to include local context

Information Retrieval
From "identical" to "similar": fusing retrieved lists based on inter-document similarities

Journal of Artificial Intelligence Research
A quasi-synchronous dependence model for information retrieval

Proceedings of the 20th ACM international conference on Information and knowledge management
Retrieval based on combining language models with clustering

CIS'04 Proceedings of the First international conference on Computational and Information Science
Word sense language model for information retrieval

AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Wikipedia-based semantic smoothing for the language modeling approach to information retrieval

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
A cascaded classification approach to semantic head recognition

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Placing images on the world map: a microblog-based enrichment approach

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Rhetorical relations for information retrieval

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Preliminary experiments using subjective logic for the polyrepresentation of information needs

Proceedings of the 4th Information Interaction in Context Symposium
The tipping point: F-score as a function of the number of retrieved items

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

The optimal settings of retrieval parameters often depend on both the document collection and the query, and are usually found through empirical tuning. In this paper, we propose a family of two-stage language models for information retrieval that explicitly captures the different influences of the query and document collection on the optimal settings of retrieval parameters. As a special case, we present a two-stage smoothing method that allows us to estimate the smoothing parameters completely automatically. In the first stage, the document language model is smoothed using a Dirichlet prior with the collection language model as the reference model. In the second stage, the smoothed document language model is further interpolated with a query background language model. We propose a leave-one-out method for estimating the Dirichlet parameter of the first stage, and the use of document mixture models for estimating the interpolation parameter of the second stage. Evaluation on five different databases and four types of queries indicates that the two-stage smoothing method with the proposed parameter estimation methods consistently gives retrieval performance that is close to---or better than---the best results achieved using a single smoothing method and exhaustive parameter search on the test data.