On the complexity of Rocchio's similarity-based relevance feedback algorithm

  • Authors:
  • Zhixiang Chen;Bin Fu

  • Affiliations:
  • Department of Computer Science, University of Texas-Pan American, 1201 W. University Drive, Edinburg, TX 78541-2999;Department of Computer Science, University of Texas-Pan American, 1201 W. University Drive, Edinburg, TX 78541-2999

  • Venue:
  • Journal of the American Society for Information Science and Technology
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Rocchio's similarity-based relevance feedback algorithm, one ofthe most important query reformation methods in informationretrieval, is essentially an adaptive learning algorithm fromexamples in searching for documents represented by a linearclassifier. Despite its popularity in various applications, thereis little rigorous analysis of its learning complexity inliterature. In this article, the authors prove for the first timethat the learning complexity of Rocchio's algorithm isO(d + d2(logd + log n)) over the discretized vectorspace {0,…, n - 1}d,when the inner product similarity measure is used. The upper boundon the learning complexity for searching for documents representedby a monotone linear classifier $\left( {\overrightarrow q ,0}\right)$ over {0,…, n -1}d can be improved to, at most, 1 +2k (n - 1) (log d -log(n - 1)), where k is the number ofnonzero components in q. Several lower bounds on thelearning complexity are also obtained for Rocchio's algorithm. Forexample, the authors prove that Rocchio's algorithm has a lowerbound $\Omega \left( {\left( {_2^d } \right){\rm{log}}\,n} \right)$on its learning complexity over the Boolean vector space {0,1}d. © 2007 Wiley Periodicals,Inc.