Smoothing document language models with probabilistic term count propagation

Authors:
Azadeh Shakery;Chengxiang Zhai
Affiliations:
Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, USA 61801;Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, USA 61801
Venue:
Information Retrieval
Year:
2008

Citing 28
Cited 8

Recent trends in hierarchic document clustering: a critical review

Information Processing and Management: an International Journal
A spectrum of automatic hypertext constructions

Hypermedia
Query expansion using local and global document analysis

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
The cluster hypothesis revisited

SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A hidden Markov model information retrieval system

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Searching for information in a hypertext medical handbook

HYPERTEXT '87 Proceedings of the ACM conference on Hypertext
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Automatic link generation

ACM Computing Surveys (CSUR)
The stochastic approach for link-structure analysis (SALSA) and the TKC effect

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Document language models, query models, and risk minimization for information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Model-based feedback in the language modeling approach to information retrieval

Proceedings of the tenth international conference on Information and knowledge management
Information Retrieval

Information Retrieval
Two-stage language models for information retrieval

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search

IEEE Transactions on Knowledge and Data Engineering
Combining document representations for known-item search

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Cluster-based retrieval using language models

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Corpus structure, language models, and ad hoc information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Modeling search engine effectiveness for federated search

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
PageRank without hyperlinks: structural re-ranking using links induced by language models

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)

TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)
Formal models for expert finding in enterprise corpora

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A probabilistic relevance propagation model for hypertext retrieval

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Language model information retrieval with document expansion

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Random walks on the click graph

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic models for expert finding

ECIR'07 Proceedings of the 29th European conference on IR research

A general optimization framework for smoothing language models on graph structures

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Statistical Language Models for Information Retrieval A Critical Review

Foundations and Trends in Information Retrieval
Enhancing Expert Finding Using Organizational Hierarchies

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Exploiting site-level information to improve web search

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Mining neighbors' topicality to better control authority flow

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Combining social network and semantic concept analysis for personalized academic researcher recommendation

Decision Support Systems
Reading contexts for structured documents retrieval

Proceedings of the 10th Conference on Open Research Areas in Information Retrieval
A novel neighborhood based document smoothing model for information retrieval

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Smoothing of document language models is critical in language modeling approaches to information retrieval. In this paper, we present a novel way of smoothing document language models based on propagating term counts probabilistically in a graph of documents. A key difference between our approach and previous approaches is that our smoothing algorithm can iteratively propagate counts and achieve smoothing with remotely related documents. Evaluation results on several TREC data sets show that the proposed method significantly outperforms the simple collection-based smoothing method. Compared with those other smoothing methods that also exploit local corpus structures, our method is especially effective in improving precision in top-ranked documents through "filling in" missing query terms in relevant documents, which is attractive since most users only pay attention to the top-ranked documents in search engine applications.