Inter-document similarities, language models, and ad hoc information retrieval

Authors:
Lillian Lee;Oren Kurland
Affiliations:
Cornell University;Cornell University
Venue:
Inter-document similarities, language models, and ad hoc information retrieval
Year:
2006

Citing 0
Cited 16

The opposite of smoothing: a language model approach to ranking query-specific document clusters

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A rank-aggregation approach to searching for optimal query-specific clusters

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Clusters, language models, and ad hoc information retrieval

ACM Transactions on Information Systems (TOIS)
Re-ranking search results using language models of query-specific clusters

Information Retrieval
Integrating clusters created offline with query-specific clusters for document retrieval

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
From "Identical" to "Similar": Fusing Retrieved Lists Based on Inter-document Similarities

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Utilizing inter-passage and inter-document similarities for re-ranking search results

Proceedings of the 18th ACM conference on Information and knowledge management
Improving legal information retrieval using an ontological framework

Artificial Intelligence and Law
Utilizing passage-based language models for ad hoc document retrieval

Information Retrieval
PageRank without hyperlinks: Structural reranking using links induced by language models

ACM Transactions on Information Systems (TOIS)
On identifying representative relevant documents

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Utilizing inter-passage and inter-document similarities for reranking search results

ACM Transactions on Information Systems (TOIS)
Re-ranking search results using an additional retrieved list

Information Retrieval
From "identical" to "similar": fusing retrieved lists based on inter-document similarities

Journal of Artificial Intelligence Research
The opposite of smoothing: a language model approach to ranking query-specific document clusters

Journal of Artificial Intelligence Research
A study of the integration of passage-, document-, and cluster-based information for re-ranking search results

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Search engines have become a crucial tool for finding information in repositories containing large amounts of textual data in unstructured form (e.g., the Web). However, the task of ad hoc information retrieval, that is, finding documents within a corpus that are relevant to an information need specified using a query, remains a hard challenge. The language modeling approach to information retrieval provides an effective framework for approaching various problems and has yielded impressive empirical performance. However, most previous work on language models for information retrieval focuses on document-specific characteristics to estimate documents' language models, and therefore does not take into account the structure of the surrounding corpus, a potentially rich source of additional information. We present a novel perspective for approaching the task of ad hoc retrieval: information provided by document-based language models can be enhanced by the incorporation of information drawn from clusters of similar documents that are created offline. We present several retrieval algorithms that are natural instantiations of this idea and that post performance that is substantially better than that of the standard language modeling approach. We also show that the best performing of these algorithms posts state-of-the-art performance for structural re-ranking, wherein an initially retrieved subset of the documents is re-ranked to obtain high precision specifically among the first few documents, using inter-document similarities within the list as an extra information source. As further exploration of the re-ranking approach just described, and inspired by the PageRank and HITS (hubs and authorities) algorithms for Web search, we propose a graph-based framework that applies to document collections lacking hyperlink information. Specifically, centrality induced over graphs wherein links represent asymmetric language-model-based inter-document similarities constitutes the basis of effective re-ranking algorithms. Combining our two paradigms for similarity representation---i.e., clusters of documents and links representing language-model-based inter-item similarities---helps to improve the effectiveness of centrality-based approaches. For example, document "authoritativeness" as induced by the HITS algorithm over cluster-document graphs is a highly effective re-ranking criterion. Furthermore, "authoritative" clusters are shown to contain a high percentage of relevant documents.