Query-sensitive similarity measures for information retrieval

Authors:
Anastasios Tombros;C. J. van Rijsbergen
Affiliations:
Department of Computing Science, University of Glasgow, Glasgow G12 8QQ, Scotland;Department of Computing Science, University of Glasgow, Glasgow G12 8QQ, Scotland
Venue:
Knowledge and Information Systems
Year:
2004

Citing 21
Cited 18

Toward memory-based reasoning

Communications of the ACM - Special issue on parallelism
The effectiveness and efficiency of agglomerative hierarchic clustering in document retrieval

The effectiveness and efficiency of agglomerative hierarchic clustering in document retrieval
Pictures of relevance: a geometric analysis of similarity measures

Journal of the American Society for Information Science
Techniques for the measurement of clustering tendency in document retrieval systems

Journal of Information Science
Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Recent trends in hierarchic document clustering: a critical review

Information Processing and Management: an International Journal
A re-examination of relevance: toward a dynamic, situational definition

Information Processing and Management: an International Journal
Presenting results of experimental retrieval comparisons

Information Processing and Management: an International Journal - Special issue on evaluation issues in information retrieval
User-defined relevance criteria: an exploratory study

Journal of the American Society for Information Science - Special issue: relevance research
Query expansion using lexical-semantic relations

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Representing documents using an explicit model of their similarities

Journal of the American Society for Information Science
Query expansion using local and global document analysis

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Reexamining the cluster hypothesis: scatter/gather on retrieval results

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
User-oriented document clustering: a framework for learning in information retrieval

Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval
Adaptive document clustering

SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Images of similarity: a visual exploration of optimal similarity metrics and scaling properties of TREC topic-document sets

Journal of the American Society for Information Science
Real life, real users, and real needs: a study and analysis of user queries on the web

Information Processing and Management: an International Journal
Clustering user queries of a search engine

Proceedings of the 10th international conference on World Wide Web
Information Retrieval

Information Retrieval
The effectiveness of query-specific hierarchic clustering in information retrieval

Information Processing and Management: an International Journal
The SMART Retrieval System—Experiments in Automatic Document Processing

The SMART Retrieval System—Experiments in Automatic Document Processing

On ranking the effectiveness of searches

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A hierarchical approach for the redesign of chemical processes

Knowledge and Information Systems
Querying color images using user-specified wavelet features

Knowledge and Information Systems
Query-sensitive mutual reinforcement chain and its application in query-oriented multi-document summarization

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Estimating average precision when judgments are incomplete

Knowledge and Information Systems
Query-Oriented Summarization Based on Neighborhood Graph Model

ICCPOL '09 Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy
Learning Similarity Functions in Graph-Based Document Summarization

ICCPOL '09 Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy
Bayesian network based business information retrieval model

Knowledge and Information Systems
Content based similarity of geographic classes organized as partition hierarchies

Knowledge and Information Systems
Traveling among clusters: a way to reconsider the benefits of the cluster hypothesis

Proceedings of the 2010 ACM Symposium on Applied Computing
Query-oriented clustering: a multi-objective approach

Proceedings of the 2010 ACM Symposium on Applied Computing
A semantic similarity approach to predicting Library of Congress subject headings for social tags

Journal of the American Society for Information Science and Technology
Factors affecting web page similarity

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Measuring the complexity of a collection of documents

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
The optimum clustering framework: implementing the cluster hypothesis

Information Retrieval
Towards a unified approach based on affinity graph to various multi-document summarizations

ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
Probabilistic co-relevance for query-sensitive similarity measurement in information retrieval

Information Processing and Management: an International Journal
Exploiting relevance, coverage, and novelty for query-focused multi-document summarization

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The application of document clustering to information retrieval has been motivated by the potential effectiveness gains postulated by the cluster hypothesis. The hypothesis states that relevant documents tend to be highly similar to each other and therefore tend to appear in the same clusters. In this paper we propose an axiomatic view of the hypothesis by suggesting that documents relevant to the same query (co-relevant documents) display an inherent similarity to each other that is dictated by the query itself. Because of this inherent similarity, the cluster hypothesis should be valid for any document collection. Our research describes an attempt to devise means by which this similarity can be detected. We propose the use of query-sensitive similarity measures that bias interdocument relationships toward pairs of documents that jointly possess attributes expressed in a query. We experimentally tested three query-sensitive measures against conventional ones that do not take the query into account, and we also examined the comparative effectiveness of the three query-sensitive measures. We calculated interdocument relationships for varying numbers of top-ranked documents for six document collections. Our results show a consistent and significant increase in the number of relevant documents that become nearest neighbors of any given relevant document when query-sensitive measures are used. These results suggest that the effectiveness of a cluster-based information retrieval system has the potential to increase through the use of query-sensitive similarity measures.