Evaluating text representations for retrieval of the best group of documents

Authors:
Xiaoyong Liu;W. Bruce Croft
Affiliations:
CIIR, Computer Science Department, University of Massachusetts, Amherst, MA;CIIR, Computer Science Department, University of Massachusetts, Amherst, MA
Venue:
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Year:
2008

Citing 13
Cited 18

Viewing morphology as an inference process

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Reexamining the cluster hypothesis: scatter/gather on retrieval results

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
The cluster hypothesis revisited

SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A hidden Markov model information retrieval system

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating document clustering for interactive information retrieval

Proceedings of the tenth international conference on Information and knowledge management
The effectiveness of query-specific hierarchic clustering in information retrieval

Information Processing and Management: an International Journal
Cluster-based retrieval using language models

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Corpus structure, language models, and ad hoc information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
The TREC robust retrieval track

ACM SIGIR Forum
Representing clusters for retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Language model information retrieval with document expansion

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Cluster-based retrieval from a language modeling perspective

Cluster-based retrieval from a language modeling perspective

Blog site search using resource selection

Proceedings of the 17th ACM conference on Information and knowledge management
Re-ranking search results using language models of query-specific clusters

Information Retrieval
Navigating in the Dark: Modeling Uncertainty in Ad Hoc Retrieval Using Multiple Relevance Models

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Predicting Query Performance by Query-Drift Estimation

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Geometric representations for multiple documents

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
On identifying representative relevant documents

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Utilizing inter-passage and inter-document similarities for reranking search results

ACM Transactions on Information Systems (TOIS)
Cluster-based fusion of retrieved lists

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Re-ranking search results using an additional retrieved list

Information Retrieval
The opposite of smoothing: a language model approach to ranking query-specific document clusters

Journal of Artificial Intelligence Research
A study of the integration of passage-, document-, and cluster-based information for re-ranking search results

Information Retrieval
Online community search using conversational structures

Information Retrieval
A cluster based pseudo feedback technique which exploits good and bad clusters

CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence
Predicting Query Performance by Query-Drift Estimation

ACM Transactions on Information Systems (TOIS)
Query-performance prediction and cluster ranking: two sides of the same coin

Proceedings of the 21st ACM international conference on Information and knowledge management
Exploring the cluster hypothesis, and cluster-based retrieval, over the web

Proceedings of the 21st ACM international conference on Information and knowledge management
Ranking document clusters using markov random fields

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
The cluster hypothesis for entity oriented search

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cluster retrieval assumes that the probability of relevance of a document should depend on the relevance of other similar documents to the same query. The goal is to find the best group of documents. Many studies have examined the effectiveness of this approach, by employing different retrieval methods or clustering algorithms, but few have investigated text representations. This paper revisits the problem of retrieving the best group of documents, from the language-modeling perspective. We analyze the advantages and disadvantages of a range of representation techniques, derive features that characterize the good document groups, and experiment with a new probabilistic representation as a first step toward incorporating these features. Empirical evaluation demonstrates that the relationship between documents can be leveraged in retrieval when a good representation technique is available, and that retrieving the best group of documents can be more effective than retrieving individual documents.