Some Formal Analysis of Roccio's Similarity-Based Relvance Feedback Algorithm

Authors:
Zhixiang Chen;Binhai Zhu
Affiliations:
-;-
Venue:
ISAAC '00 Proceedings of the 11th International Conference on Algorithms and Computation
Year:
2000

Citing 7
Cited 5

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
The Perceptron algorithm versus Winnow: linear versus logarithmic mistake bounds when few input variables are relevant

Artificial Intelligence - Special issue on relevance
A vector space model for automatic indexing

Communications of the ACM
Modern Information Retrieval

Modern Information Retrieval
Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm

Machine Learning
ImageRover: A Content-Based Image Browser for the World Wide Web

CAIVL '97 Proceedings of the 1997 Workshop on Content-Based Access of Image and Video Libraries (CBAIVL '97)
WebSail: From On-line Learning to Web Search

WISE '00 Proceedings of the First International Conference on Web Information Systems Engineering (WISE'00)-Volume 1 - Volume 1

Using User Profiles in Intelligent Information Retrieval

ISMIS '02 Proceedings of the 13th International Symposium on Foundations of Intelligent Systems
Multiplicative Adaptive Algorithms for User Preference Retrieval

COCOON '01 Proceedings of the 7th Annual International Conference on Computing and Combinatorics
A quadratic lower bound for rocchio's similarity-based relevance feedback algorithm

COCOON'05 Proceedings of the 11th annual international conference on Computing and Combinatorics
On the complexity of rocchio's similarity-based relevance feedback algorithm

ISAAC'05 Proceedings of the 16th international conference on Algorithms and Computation
Personalisation of web search

ITWP'03 Proceedings of the 2003 international conference on Intelligent Techniques for Web Personalization

Quantified Score

Hi-index	0.00

Visualization

Abstract

Rocchio's similarity-based Relevance feedback algorithm, one of the most important query reformation methods in information retrieval, is essentially an adaptive supervised learning algorithm from examples. In spite of its popularity in various applications there is little rigorous analysis of its learning complexity in literature. In this paper we show that in the Boolean vector space model, if the initial query vector is 0, then for any of the four typical similarities (inner product, dice coefficient, cosine coefficient, and Jaccard coefficient), Rocchio's similarity-based relevance feedback algorithm makes at least n mistakes when used to search for a collection of documents represented by a monotone disjunction of at most k relevant features (or terms) over the n-dimensional Boolean vector space {0, 1}n. When an arbitrary initial query vector in {0, 1}n is used, it makes at least (n + k -3)/2 mistakes to search for the same collection of documents. The linear lower bounds are independent of the choices of the threshold and coefficients that the algorithm may use in updating its query vector and making its classification.