The opposite of smoothing: a language model approach to ranking query-specific document clusters

Authors:
Oren Kurland;Eyal Krikon
Affiliations:
Faculty of Industrial Engineering and Management Technion, Israel Institute of Technology;Faculty of Industrial Engineering and Management Technion, Israel Institute of Technology
Venue:
Journal of Artificial Intelligence Research
Year:
2011

Citing 47
Cited 1

Query expansion using local and global document analysis

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Reexamining the cluster hypothesis: scatter/gather on retrieval results

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
The cluster hypothesis revisited

SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Using interdocument similarity information in document retrieval systems

Readings in information retrieval
Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Demonstration of hierarchical document clustering of digital library retrieval results

Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Document language models, query models, and risk minimization for information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating document clustering for interactive information retrieval

Proceedings of the tenth international conference on Information and knowledge management
Information Retrieval

Information Retrieval
A language modeling framework for resource selection and results merging

Proceedings of the eleventh international conference on Information and knowledge management
The effectiveness of query-specific hierarchic clustering in information retrieval

Information Processing and Management: an International Journal
Evaluating a Visual Navigation System for a Digital Library

ECDL '98 Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries
Language Modeling for Information Retrieval

Language Modeling for Information Retrieval
Cluster-based retrieval using language models

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Corpus structure, language models, and ad hoc information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
The NRRC reliable information access (RIA) workshop

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
The effect of document retrieval quality on factoid question answering performance

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Why current IR engines fail

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
PageRank without hyperlinks: structural re-ranking using links induced by language models

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Re-ranking method based on inter-document distances

Information Processing and Management: an International Journal
A generative theory of relevance

A generative theory of relevance
Regularizing ad hoc retrieval scores

Proceedings of the 14th ACM international conference on Information and knowledge management
Automatically labeling hierarchical clusters

dg.o '06 Proceedings of the 2006 international conference on Digital government research
Respect my authority!: HITS without hyperlinks, utilizing cluster-based language models

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Improving the estimation of relevance models using large external corpora

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Regularized estimation of mixture models for robust pseudo-relevance feedback

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
LDA-based document models for ad-hoc retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Representing clusters for retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Testing the cluster hypothesis in distributed information retrieval

Information Processing and Management: an International Journal
Document re-ranking using cluster validation and label propagation

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Graph-based ranking algorithms for sentence extraction, applied to text summarization

ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
Using random walks for question-focused sentence retrieval

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Language model information retrieval with document expansion

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Language model-based document clustering using random walks

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Inter-document similarities, language models, and ad hoc information retrieval

Inter-document similarities, language models, and ad hoc information retrieval
Automatic labeling of multinomial topic models

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Relevance models for topic detection and tracking

HLT '02 Proceedings of the second international conference on Human Language Technology Research
A rank-aggregation approach to searching for optimal query-specific clusters

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Re-ranking search results using language models of query-specific clusters

Information Retrieval
Evaluating text representations for retrieval of the best group of documents

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Geometric representations for multiple documents

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Cluster generation and cluster labelling for web snippets: a fast and accurate hierarchical solution

SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval

Ranking document clusters using markov random fields

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Exploiting information induced from (query-specific) clustering of top-retrieved documents has long been proposed as a means for improving precision at the very top ranks of the returned results. We present a novel language model approach to ranking query-specific clusters by the presumed percentage of relevant documents that they contain. While most previous cluster ranking approaches focus on the cluster as a whole, our model utilizes also information induced from documents associated with the cluster. Our model substantially outperforms previous approaches for identifying clusters containing a high relevant-document percentage. Furthermore, using the model to produce document ranking yields precision-at-top-ranks performance that is consistently better than that of the initial ranking upon which clustering is performed. The performance also favorably compares with that of a state-of-the-art pseudo-feedback-based retrieval method.