Google news personalization: scalable online collaborative filtering
Proceedings of the 16th international conference on World Wide Web
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Large-Scale Parallel Collaborative Filtering for the Netflix Prize
AAIM '08 Proceedings of the 4th international conference on Algorithmic Aspects in Information and Management
Collaborative Filtering for Implicit Feedback Datasets
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
A comparison of join algorithms for log processing in MaPreduce
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Spark: cluster computing with working sets
HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
SystemML: Declarative machine learning on MapReduce
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Large-scale matrix factorization with distributed stochastic gradient descent
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
MyMediaLite: a free recommender system library
Proceedings of the fifth ACM conference on Recommender systems
Distributed GraphLab: a framework for machine learning and data mining in the cloud
Proceedings of the VLDB Endowment
Building industrial-scale real-world recommender systems
Proceedings of the sixth ACM conference on Recommender systems
Scalable similarity-based neighborhood methods with MapReduce
Proceedings of the sixth ACM conference on Recommender systems
Myriad: scalable and expressive data generation
Proceedings of the VLDB Endowment
ICDM '12 Proceedings of the 2012 IEEE 12th International Conference on Data Mining
Hi-index | 0.00 |
The efficient, distributed factorization of large matrices on clusters of commodity machines is crucial to applying latent factor models in industrial-scale recommender systems. We propose an efficient, data-parallel low-rank matrix factorization with Alternating Least Squares which uses a series of broadcast-joins that can be efficiently executed with MapReduce. We empirically show that the performance of our solution is suitable for real-world use cases. We present experiments on two publicly available datasets and on a synthetic dataset termed Bigflix, generated from the Netflix dataset. Bigflix contains 25 million users and more than 5 billion ratings, mimicking data sizes recently reported as Netflix' production workload. We demonstrate that our approach is able to run an iteration of Alternating Least Squares in six minutes on this dataset. Our implementation has been contributed to the open source machine learning library Apache Mahout.