Towards a scalable kNN CF algorithm: exploring effective applications of clustering

  • Authors:
  • Al Mamunur Rashid;Shyong K. Lam;Adam LaPitz;George Karypis;John Riedl

  • Affiliations:
  • Computer Science and Engineering, University of Minnesota, Minneapolis, MN;Computer Science and Engineering, University of Minnesota, Minneapolis, MN;Computer Science and Engineering, University of Minnesota, Minneapolis, MN;Computer Science and Engineering, University of Minnesota, Minneapolis, MN;Computer Science and Engineering, University of Minnesota, Minneapolis, MN

  • Venue:
  • WebKDD'06 Proceedings of the 8th Knowledge discovery on the web international conference on Advances in web mining and web usage analysis
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Collaborative Filtering (CF)-based recommender systems bring mutual benefits to both users and the operators of the sites with too much information. Users benefit as they are able to find items of interest from an unmanageable number of available items. On the other hand, e-commerce sites that employ recommender systems can increase sales revenue in at least two ways: a) by drawing customers' attention to items that they are likely to buy, and b) by cross-selling items. However, the sheer number of customers and items typical in e-commerce systems demand specially designed CF algorithms that can gracefully cope with the vast size of the data. Many algorithms proposed thus far, where the principal concern is recommendation quality, may be too expensive to operate in a large-scale system. We propose CLUSTKNN, a simple and intuitive algorithm that is well suited for large data sets. The method first compresses data tremendously by building a straightforward but efficient clustering model. Recommendations are then generated quickly by using a simple NEAREST NEIGHBOR-based approach. We demonstrate the feasibility of CLUSTKNN both analytically and empirically. We also show, by comparing with a number of other popular CF algorithms that, apart from being highly scalable and intuitive, CLUSTKNN provides very good recommendation accuracy as well.