A Theoretical Analysis of Query Selection for Collaborative Filtering

  • Authors:
  • Sanjoy Dasgupta;Wee Sun Lee;Philip M. Long

  • Affiliations:
  • Computer Science Department, University of California at San Diego, USA. dasgupta@cs.ucsd.edu;Computer Science Department and Singapore-MIT Alliance, National University of Singapore. leews@comp.nus.edu.sg;Genome Institute of Singapore. gislongp@nus.edu.sg

  • Venue:
  • Machine Learning
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the problem of determining which of a set of experts has tastes most similar to a given user by asking the user questions about his likes and dislikes. We describe a simple algorithm for generating queries for a theoretical model of this problem. We show that the algorithm requires at most opt(F)(ln(|F|/opt(F)) + 1) + 1 queries to find the correct expert, where opt(F) is the optimal worst-case bound on the number of queries for learning arbitrary elements of the set of experts F. The algorithm runs in time polynomial in |F| and |X| (where X is the domain) and we prove that no polynomial-time algorithm can have a significantly better bound on the number of queries unless all problems in NP have nO(log log n) time algorithms. We also study a more general case where the user ratings come from a finite set Y and there is an integer-valued loss function ℓ on Y that is used to measure the distance between the ratings. Assuming that the loss function is a metric and that there is an expert within a distance η from the user, we give a polynomial-time algorithm that is guaranteed to find such an expert after at most 2opt(F, η) ln \frac{|F|}{1+\newdeg(F,\eta)} + 2(η + 1)(1 + deg(F, η)) queries, where deg(F, η) is the largest number of experts in F that are within a distance 2η of any f ∈ F.