On the complexity of Rocchio's similarity-based relevance feedback algorithm

Authors:
Zhixiang Chen;Bin Fu
Affiliations:
Department of Computer Science, University of Texas-Pan American, 1201 W. University Drive, Edinburg, TX 78541-2999;Department of Computer Science, University of Texas-Pan American, 1201 W. University Drive, Edinburg, TX 78541-2999
Venue:
Journal of the American Society for Information Science and Technology
Year:
2007

Citing 13
Cited 0

Linear structure in information retrieval

SIGIR '88 Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Information retrieval: data structures and algorithms

Information retrieval: data structures and algorithms
How fast can a threshold gate learn?

Proceedings of a workshop on Computational learning theory and natural learning systems (vol. 1) : constraints and prospects: constraints and prospects
On the learnability of Zn-DNF formulas (extended abstract)

COLT '95 Proceedings of the eighth annual conference on Computational learning theory
The Perceptron algorithm versus Winnow: linear versus logarithmic mistake bounds when few input variables are relevant

Artificial Intelligence - Special issue on relevance
Latent semantic indexing: a probabilistic analysis

Journal of Computer and System Sciences - Special issue on the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems
A vector space model for automatic indexing

Communications of the ACM
Information Retrieval

Information Retrieval
Modern Information Retrieval

Modern Information Retrieval
Some Formal Analysis of Rocchio's Similarity-Based Relevance Feedback Algorithm

Information Retrieval
Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm

Machine Learning
A quadratic lower bound for rocchio's similarity-based relevance feedback algorithm

COCOON'05 Proceedings of the 11th annual international conference on Computing and Combinatorics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Rocchio's similarity-based relevance feedback algorithm, one ofthe most important query reformation methods in informationretrieval, is essentially an adaptive learning algorithm fromexamples in searching for documents represented by a linearclassifier. Despite its popularity in various applications, thereis little rigorous analysis of its learning complexity inliterature. In this article, the authors prove for the first timethat the learning complexity of Rocchio's algorithm isO(d + d2(logd + log n)) over the discretized vectorspace {0,…, n - 1}d,when the inner product similarity measure is used. The upper boundon the learning complexity for searching for documents representedby a monotone linear classifier $\left( {\overrightarrow q ,0}\right)$ over {0,…, n -1}d can be improved to, at most, 1 +2k (n - 1) (log d -log(n - 1)), where k is the number ofnonzero components in q. Several lower bounds on thelearning complexity are also obtained for Rocchio's algorithm. Forexample, the authors prove that Rocchio's algorithm has a lowerbound $\Omega \left( {\left( {_2^d } \right){\rm{log}}\,n} \right)$on its learning complexity over the Boolean vector space {0,1}d. © 2007 Wiley Periodicals,Inc.