Optimization of some factors affecting the performance of query expansion

  • Authors:
  • Young Mee Chung;Jae Yun Lee

  • Affiliations:
  • Department of Library and Information Science, Yonsei University, 134 Shinchondong, Sodaemunku, Seoul, South Korea;Department of Library and Information Science, Yonsei University, 134 Shinchondong, Sodaemunku, Seoul, South Korea

  • Venue:
  • Information Processing and Management: an International Journal
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper examines the factors affecting the performance of global query expansion based on term co-occurrence data and suggests a way to maximize the retrieval effectiveness. Major parameters to be optimized through experiments are term similarity measure and the weighting scheme of additional terms. The evaluation of four similarity measures tested in query expansion reveal that mutual information and Yule's Y, which emphasize low frequency terms, achieve better performance than cosine and Jaccard coefficients that have the reverse tendency. In the evaluation of three weighting schemes, similarity weight performs well only with short queries, whereas fixed weights of approximately 0.5 and similarity rank weights were effective with queries of any length. Furthermore, the optimal similarity rank weight achieving the best overall performance seems to be the least affected by test collections and the number of additional terms. For the efficiency of retrieval, the number of additional terms needs not exceed 70 in our test collections, but the optimal number may vary according to the characteristics of the similarity measure employed.