The opposite of smoothing: a language model approach to ranking query-specific document clusters
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A rank-aggregation approach to searching for optimal query-specific clusters
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Clusters, language models, and ad hoc information retrieval
ACM Transactions on Information Systems (TOIS)
Re-ranking search results using language models of query-specific clusters
Information Retrieval
Integrating clusters created offline with query-specific clusters for document retrieval
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
From "Identical" to "Similar": Fusing Retrieved Lists Based on Inter-document Similarities
ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Utilizing inter-passage and inter-document similarities for re-ranking search results
Proceedings of the 18th ACM conference on Information and knowledge management
Improving legal information retrieval using an ontological framework
Artificial Intelligence and Law
Utilizing passage-based language models for ad hoc document retrieval
Information Retrieval
PageRank without hyperlinks: Structural reranking using links induced by language models
ACM Transactions on Information Systems (TOIS)
On identifying representative relevant documents
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Utilizing inter-passage and inter-document similarities for reranking search results
ACM Transactions on Information Systems (TOIS)
Re-ranking search results using an additional retrieved list
Information Retrieval
From "identical" to "similar": fusing retrieved lists based on inter-document similarities
Journal of Artificial Intelligence Research
The opposite of smoothing: a language model approach to ranking query-specific document clusters
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
Search engines have become a crucial tool for finding information in repositories containing large amounts of textual data in unstructured form (e.g., the Web). However, the task of ad hoc information retrieval, that is, finding documents within a corpus that are relevant to an information need specified using a query, remains a hard challenge. The language modeling approach to information retrieval provides an effective framework for approaching various problems and has yielded impressive empirical performance. However, most previous work on language models for information retrieval focuses on document-specific characteristics to estimate documents' language models, and therefore does not take into account the structure of the surrounding corpus, a potentially rich source of additional information. We present a novel perspective for approaching the task of ad hoc retrieval: information provided by document-based language models can be enhanced by the incorporation of information drawn from clusters of similar documents that are created offline. We present several retrieval algorithms that are natural instantiations of this idea and that post performance that is substantially better than that of the standard language modeling approach. We also show that the best performing of these algorithms posts state-of-the-art performance for structural re-ranking, wherein an initially retrieved subset of the documents is re-ranked to obtain high precision specifically among the first few documents, using inter-document similarities within the list as an extra information source. As further exploration of the re-ranking approach just described, and inspired by the PageRank and HITS (hubs and authorities) algorithms for Web search, we propose a graph-based framework that applies to document collections lacking hyperlink information. Specifically, centrality induced over graphs wherein links represent asymmetric language-model-based inter-document similarities constitutes the basis of effective re-ranking algorithms. Combining our two paradigms for similarity representation---i.e., clusters of documents and links representing language-model-based inter-item similarities---helps to improve the effectiveness of centrality-based approaches. For example, document "authoritativeness" as induced by the HITS algorithm over cluster-document graphs is a highly effective re-ranking criterion. Furthermore, "authoritative" clusters are shown to contain a high percentage of relevant documents.