Communications of the ACM
Matrix computations (3rd ed.)
Recommender systems in e-commerce
Proceedings of the 1st ACM conference on Electronic commerce
Analysis of recommendation algorithms for e-commerce
Proceedings of the 2nd ACM conference on Electronic commerce
Concept decompositions for large sparse text data using clustering
Machine Learning
Information-theoretic co-clustering
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Improving recommendation lists through topic diversification
WWW '05 Proceedings of the 14th international conference on World Wide Web
A Scalable Collaborative Filtering Framework Based on Co-Clustering
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
A Generalized Maximum Entropy Approach to Bregman Co-clustering and Matrix Approximation
The Journal of Machine Learning Research
Large-Scale Parallel Collaborative Filtering for the Netflix Prize
AAIM '08 Proceedings of the 4th international conference on Algorithmic Aspects in Information and Management
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
I/O scalable Bregman co-clustering
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Empirical analysis of predictive algorithms for collaborative filtering
UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Scalable co-clustering algorithms
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Hi-index | 0.00 |
Collaborative filtering (CF) based recommender systems have gained wide popularity in Internet companies like Amazon, Netflix, Google News, and others. These systems make automatic predictions about the interests of a user by inferring from information about like-minded users. Real-time CF on highly sparse massive datasets, while achieving a high prediction accuracy, is a computationally challenging problem. In this paper, we present a novel design for soft real-time (less than 10 sec.) distributed co-clustering based Collaborative Filtering algorithm. Our distributed algorithm has been optimized for multi-core cluster architectures using pipelined parallelism, computation communication overlap and communication optimizations. Theoretical parallel time complexity analysis of our algorithm proves the efficacy of our approach. Using the Netflix dataset (100M ratings), we demonstrate the performance and scalability of our algorithm on 1024-node Blue Gene/P system. Our distributed algorithm (implemented using OpenMP with MPI) delivered training time of around 6s on the full Netflix dataset and prediction time of 2.5s on 1.4M ratings (1.78µs per rating prediction). Our training time is around 20× (more than one order of magnitude) better than the best known parallel training time, along with high accuracy (0.87±0.02 RMSE). To the best of our knowledge, this is the best known parallel performance for collaborative filtering on Netflix data at such high accuracy and also the first such implementation on multi-core cluster architectures such as Blue Gene/P.