On the complexity of rocchio's similarity-based relevance feedback algorithm

  • Authors:
  • Zhixiang Chen;Bin Fu

  • Affiliations:
  • Department of Computer Science, University of Texas-Pan American, Edinburg, TX;,Department of Computer Science, University of New Orleans, New Orleans, LA

  • Venue:
  • ISAAC'05 Proceedings of the 16th international conference on Algorithms and Computation
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we prove for the first time that the learning complexity of Rocchio's algorithm is O(d + d2(log d + log n)) over the discretized vector space {0,...,n–1}d, when the inner product similarity measure is used. The upper bound on the learning complexity for searching for documents represented by a monotone linear classifier (q,0) over {0,...,n–1}d can be improved to O(d + 2k(n–1)(log d + log(n–1))), where k is the number of nonzero components in q. An Ω((d2)log n) lower bound on the learning complexity is also obtained for Rocchio's algorithm over {0,...,n–1}d. In practice, Rocchio's algorithm often uses fixed query updating factors. When this is the case, the lower bound is strengthened to 2$^{{\it \Omega}(d)}$ over the binary vector space {0,1}d. In general, if the query updating factors are bounded by O(nc) for some constant c≥ 0, an ${\it \Omega}(n^{d-1-c}/(n-1))$ lower bound is obtained over {0,...,n–1}d.