Probabilistic co-relevance for query-sensitive similarity measurement in information retrieval

Authors:
Seung-Hoon Na
Affiliations:
Natural Language Processing Research Term, Electronics and Telecommunications Research Institute, South Korea
Venue:
Information Processing and Management: an International Journal
Year:
2013

Citing 31
Cited 1

Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Pivoted document length normalization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Reexamining the cluster hypothesis: scatter/gather on retrieval results

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchic document classification using Ward's clustering method

Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval
The cluster hypothesis revisited

SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A hidden Markov model information retrieval system

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A vector space model for automatic indexing

Communications of the ACM
Document language models, query models, and risk minimization for information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Query-sensitive similarity measures for the calculation of interdocument relationships

Proceedings of the tenth international conference on Information and knowledge management
The effectiveness of query-specific hierarchic clustering in information retrieval

Information Processing and Management: an International Journal
A Linguistically Motivated Probabilistic Model of Information Retrieval

ECDL '98 Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries
Cluster-based retrieval using language models

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Corpus structure, language models, and ad hoc information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
PageRank without hyperlinks: structural re-ranking using links induced by language models

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Query-sensitive similarity measures for information retrieval

Knowledge and Information Systems
A parallel derivation of probabilistic information retrieval models

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Query-Sensitive Similarity Measure for Content-Based Image Retrieval

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Language model information retrieval with document expansion

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
An empirical study of query expansion and cluster-based retrieval in language modeling approach

Information Processing and Management: an International Journal - Special issue: AIRS2005: Information retrieval research in Asia
Relevance models for topic detection and tracking

HLT '02 Proceedings of the second international conference on Human Language Technology Research
An analysis on document length retrieval trends in language modeling smoothing

Information Retrieval
Query-sensitive mutual reinforcement chain and its application in query-oriented multi-document summarization

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A New Measure of the Cluster Hypothesis

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
A comparative study of methods for estimating query language models with pseudo feedback

Proceedings of the 18th ACM conference on Information and knowledge management
Probabilistic document length priors for language models

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Fast query expansion using approximations of relevance models

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A Generative Theory of Relevance

A Generative Theory of Relevance
The optimum clustering framework: implementing the cluster hypothesis

Information Retrieval

A design of knowledge management tool for supporting product development

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Interdocument similarities are the fundamental information source required in cluster-based retrieval, which is an advanced retrieval approach that significantly improves performance during information retrieval (IR). An effective similarity metric is query-sensitive similarity, which was introduced by Tombros and Rijsbergen as method to more directly satisfy the cluster hypothesis that forms the basis of cluster-based retrieval. Although this method is reported to be effective, existing applications of query-specific similarity are still limited to vector space models wherein there is no connection to probabilistic approaches. We suggest a probabilistic framework that defines query-sensitive similarity based on probabilistic co-relevance, where the similarity between two documents is proportional to the probability that they are both co-relevant to a specific given query. We further simplify the proposed co-relevance-based similarity by decomposing it into two separate relevance models. We then formulate all the requisite components for the proposed similarity metric in terms of scoring functions used by language modeling methods. Experimental results obtained using standard TREC test collections consistently showed that the proposed query-sensitive similarity measure performs better than term-based similarity and existing query-sensitive similarity in the context of Voorhees' nearest neighbor test (NNT).