Towards effective genomic information retrieval: The impact of query complexity and expansion strategies

  • Authors:
  • Xiangming Mu; Kun Lu

  • Affiliations:
  • School of Information Studies, University of Wisconsin- Milwaukee, Milwaukee, WI, USA;School of Information Studies, University of Wisconsin- Milwaukee, Milwaukee, WI, USA

  • Venue:
  • Journal of Information Science
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The goal of this study is to examine the influence of query complexity and different query expansion strategies on the effectiveness of genomic information retrieval. Query complexity is defined as the average number of terms in a query. The query expansion strategies are based on the Unified Medical Language System (UMLS) Metathesaurus. The combination of string/word indexing on concept/term level provides four different automatic UMLS expansion strategies. The test collection for the study is the TREC Genomic Track 2006 data set (24 topics). We use the Mean Average Precision (MAP), 11-point precision/recall, and average precision/recall at maximum recall point for the retrieval performance evaluation. In general, we found that applying query expansion did not improve the retrieval effectiveness as compared to the baseline. Our results also indicated that string index expansions are more effective than word index expansions. Queries with a smaller number of terms outperform queries with a larger number of terms and the difference is statistically significant. String index on term level is the best automatic UMLS expansion strategy for queries with a smaller number of terms. The worst case scenario is applying word index expansions on queries with a larger number of terms. Based on these findings, we recommend that genomic information retrieval systems should support flexible query expansion strategies to best accommodate queries with different levels of complexity.