An empirical study of query expansion and cluster-based retrieval in language modeling approach

Authors:
Seung-Hoon Na;In-Su Kang;Ji-Eun Roh;Jong-Hyeok Lee
Affiliations:
Division of Electrical and Computer Engineering, POSTECH, AITrc, Republic of Korea;Division of Electrical and Computer Engineering, POSTECH, AITrc, Republic of Korea;Division of Electrical and Computer Engineering, POSTECH, AITrc, Republic of Korea;Division of Electrical and Computer Engineering, POSTECH, AITrc, Republic of Korea
Venue:
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Year:
2005

Citing 18
Cited 1

A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A hidden Markov model information retrieval system

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval as statistical translation

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A general language model for information retrieval (poster abstract)

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval

A language modeling approach to information retrieval
Unsupervised learning by probabilistic latent semantic analysis

Machine Learning
Document language models, query models, and risk minimization for information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Model-based feedback in the language modeling approach to information retrieval

Proceedings of the tenth international conference on Information and knowledge management
Term-specific smoothing for the language modeling approach to information retrieval: the importance of a query term

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Cross-lingual relevance models

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Interpreting and Extending Classical Agglomerative Clustering Algorithms using a Model-Based approach

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Dependence language model for information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Parsimonious language models for information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Cluster-based retrieval using language models

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Corpus structure, language models, and ad hoc information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Estimation of query model from parsimonious translation model

AIRS'04 Proceedings of the 2004 international conference on Asian Information Retrieval Technology

A Web Knowledge Discovery Engine Based on Concept Algebra

International Journal of Cognitive Informatics and Natural Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In information retrieval, the word mismatch problem is a critical issue. To resolve the problem, several techniques have been developed, such as query expansion, cluster-based retrieval, and dimensionality reduction. Of these techniques, this paper performs an empirical study on query expansion and cluster-based retrieval. We examine the effect of using parsimony in query expansion and the effect of clustering algorithms in cluster-based retrieval. In addition, query expansion and cluster-based retrieval are compared, and their combinations are evaluated in terms of retrieval performance. By performing experimentation on seven test collections of NTCIR and TREC, we conclude that 1) query expansion using parsimony is well performed, 2) cluster-based retrieval by agglomerative clustering is better than that by partitioning clustering, and 3) query expansion is generally more effective than cluster-based retrieval in resolving the word-mismatch problem, and finally 4) their combinations are effective when each method significantly improves baseline performance.