An empirical study of query expansion and cluster-based retrieval in language modeling approach

  • Authors:
  • Seung-Hoon Na;In-Su Kang;Ji-Eun Roh;Jong-Hyeok Lee

  • Affiliations:
  • Division of Electrical and Computer Engineering, POSTECH, AITrc, Republic of Korea;Division of Electrical and Computer Engineering, POSTECH, AITrc, Republic of Korea;Division of Electrical and Computer Engineering, POSTECH, AITrc, Republic of Korea;Division of Electrical and Computer Engineering, POSTECH, AITrc, Republic of Korea

  • Venue:
  • AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

In information retrieval, the word mismatch problem is a critical issue. To resolve the problem, several techniques have been developed, such as query expansion, cluster-based retrieval, and dimensionality reduction. Of these techniques, this paper performs an empirical study on query expansion and cluster-based retrieval. We examine the effect of using parsimony in query expansion and the effect of clustering algorithms in cluster-based retrieval. In addition, query expansion and cluster-based retrieval are compared, and their combinations are evaluated in terms of retrieval performance. By performing experimentation on seven test collections of NTCIR and TREC, we conclude that 1) query expansion using parsimony is well performed, 2) cluster-based retrieval by agglomerative clustering is better than that by partitioning clustering, and 3) query expansion is generally more effective than cluster-based retrieval in resolving the word-mismatch problem, and finally 4) their combinations are effective when each method significantly improves baseline performance.