TopRecs: Top-k algorithms for item-based collaborative filtering

  • Authors:
  • Mohammad Khabbaz;Laks V. S. Lakshmanan

  • Affiliations:
  • University of British Columbia;University of British Columbia

  • Venue:
  • Proceedings of the 14th International Conference on Extending Database Technology
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recommender systems help users find their items of interest from large data collections with little effort. Collaborative filtering (CF) is one of the most popular approaches for making recommendations. While significant work has been done on improving accuracy of CF methods, some of the most popular CF approaches are limited in terms of scalability and efficiency. The size of data in modern recommender systems is growing rapidly in terms of both new users and items and new ratings. Item-based recommendation is one of the CF approaches used widely in practice. It computes and uses an item-item similarity matrix in order to predict unknown ratings. Previous works on item-based CF method confirm its usefulness in providing high quality top-k results. In this paper, we design a scalable algorithm for top-k recommendations using this method. We achieve this by probabilistic modeling of the similarity matrix. A unique challenge here is that the ratings that are aggregated to produce the aggregate predicted score for a user should be obtained from different lists for different candidate items and the aggregate function is non-monotone. We propose a layered architecture for CF systems that facilitates computation of the most relevant items for a given user. We design efficient top-k algorithms and data structures in order to achieve high scalability. Our algorithm is based on abstracting the key computation of a CF algorithm in terms of two operations -- probe and explore. The algorithm uses a cost-based optimization whereby we express the overall cost as a function of a similarity threshold and determine its optimal value for minimizing the cost. We empirically evaluate our theoretical results on a large real world dataset. Our experiments show our exact top-k algorithm achieves better scalability compared to solid baseline algorithms.