Collaborative Filtering Using a Regression-Based Approach

  • Authors:
  • Slobodan Vucetic;Zoran Obradovic

  • Affiliations:
  • Temple University, Center for Information Science and Technology, 304 Wachman Hall, 1805 North Broad St., 19122, Philadelphia, PA, USA;Temple University, Center for Information Science and Technology, 304 Wachman Hall, 1805 North Broad St., 19122, Philadelphia, PA, USA

  • Venue:
  • Knowledge and Information Systems
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The task of collaborative filtering is to predict the preferences of an active user for unseen items given preferences of other users. These preferences are typically expressed as numerical ratings. In this paper, we propose a novel regression-based approach that first learns a number of experts describing relationships in ratings between pairs of items. Based on ratings provided by an active user for some of the items, the experts are combined by using statistical methods to predict the user’s preferences for the remaining items. The approach was designed to efficiently address the problem of data sparsity and prediction latency that characterise collaborative filtering. Extensive experiments on Eachmovie and Jester benchmark collaborative filtering data show that the proposed regression-based approach achieves improved accuracy and is orders of magnitude faster than the popular neighbour-based alternative. The difference in accuracy was more evident when the number of ratings provided by an active user was small, as is common for real-life recommendation systems. Additional benefits were observed in predicting items with large rating variability. To provide a more detailed characterisation of the proposed algorithm, additional experiments were performed on synthetic data with second-order statistics similar to that of the Eachmovie data. Strong experimental evidence was obtained that the proposed approach can be applied to data over a large range of sparsity scenarios and is superior to non-personalised predictors even when ratings data are very sparse.