Distributed scalable collaborative filtering algorithm

Authors:
Ankur Narang;Abhinav Srivastava;Naga Praveen Kumar Katta
Affiliations:
IBM India Research Laboratory, New Delhi;IBM India Research Laboratory, New Delhi;IBM India Research Laboratory, New Delhi
Venue:
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Year:
2011

Citing 14
Cited 0

Recommender systems

Communications of the ACM
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
Recommender systems in e-commerce

Proceedings of the 1st ACM conference on Electronic commerce
Analysis of recommendation algorithms for e-commerce

Proceedings of the 2nd ACM conference on Electronic commerce
Concept decompositions for large sparse text data using clustering

Machine Learning
Information-theoretic co-clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Improving recommendation lists through topic diversification

WWW '05 Proceedings of the 14th international conference on World Wide Web
A Scalable Collaborative Filtering Framework Based on Co-Clustering

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
A Generalized Maximum Entropy Approach to Bregman Co-clustering and Matrix Approximation

The Journal of Machine Learning Research
Large-Scale Parallel Collaborative Filtering for the Netflix Prize

AAIM '08 Proceedings of the 4th international conference on Algorithmic Aspects in Information and Management
Pervasive parallelism in data mining: dataflow solution to co-clustering large and sparse Netflix data

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
I/O scalable Bregman co-clustering

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Empirical analysis of predictive algorithms for collaborative filtering

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Scalable co-clustering algorithms

ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Collaborative filtering (CF) based recommender systems have gained wide popularity in Internet companies like Amazon, Netflix, Google News, and others. These systems make automatic predictions about the interests of a user by inferring from information about like-minded users. Real-time CF on highly sparse massive datasets, while achieving a high prediction accuracy, is a computationally challenging problem. In this paper, we present a novel design for soft real-time (less than 10 sec.) distributed co-clustering based Collaborative Filtering algorithm. Our distributed algorithm has been optimized for multi-core cluster architectures using pipelined parallelism, computation communication overlap and communication optimizations. Theoretical parallel time complexity analysis of our algorithm proves the efficacy of our approach. Using the Netflix dataset (100M ratings), we demonstrate the performance and scalability of our algorithm on 1024-node Blue Gene/P system. Our distributed algorithm (implemented using OpenMP with MPI) delivered training time of around 6s on the full Netflix dataset and prediction time of 2.5s on 1.4M ratings (1.78µs per rating prediction). Our training time is around 20× (more than one order of magnitude) better than the best known parallel training time, along with high accuracy (0.87±0.02 RMSE). To the best of our knowledge, this is the best known parallel performance for collaborative filtering on Netflix data at such high accuracy and also the first such implementation on multi-core cluster architectures such as Blue Gene/P.