Distributed matrix factorization with mapreduce using a series of broadcast-joins

Authors:
Sebastian Schelter;Christoph Boden;Martin Schenck;Alexander Alexandrov;Volker Markl
Affiliations:
Technische Universität Berlin, Berlin, Germany;Technische Universität Berlin, Berlin, Germany;Technische Universität Berlin, Berlin, Germany;Technische Universität Berlin, Berlin, Germany;Technische Universität Berlin, Berlin, Germany
Venue:
Proceedings of the 7th ACM conference on Recommender systems
Year:
2013

Citing 15
Cited 0

Google news personalization: scalable online collaborative filtering

Proceedings of the 16th international conference on World Wide Web
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Large-Scale Parallel Collaborative Filtering for the Netflix Prize

AAIM '08 Proceedings of the 4th international conference on Algorithmic Aspects in Information and Management
Collaborative Filtering for Implicit Feedback Datasets

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Matrix Factorization Techniques for Recommender Systems

Computer
A comparison of join algorithms for log processing in MaPreduce

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Spark: cluster computing with working sets

HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
SystemML: Declarative machine learning on MapReduce

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Large-scale matrix factorization with distributed stochastic gradient descent

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
MyMediaLite: a free recommender system library

Proceedings of the fifth ACM conference on Recommender systems
Distributed GraphLab: a framework for machine learning and data mining in the cloud

Proceedings of the VLDB Endowment
Building industrial-scale real-world recommender systems

Proceedings of the sixth ACM conference on Recommender systems
Scalable similarity-based neighborhood methods with MapReduce

Proceedings of the sixth ACM conference on Recommender systems
Myriad: scalable and expressive data generation

Proceedings of the VLDB Endowment
Distributed Matrix Completion

ICDM '12 Proceedings of the 2012 IEEE 12th International Conference on Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

The efficient, distributed factorization of large matrices on clusters of commodity machines is crucial to applying latent factor models in industrial-scale recommender systems. We propose an efficient, data-parallel low-rank matrix factorization with Alternating Least Squares which uses a series of broadcast-joins that can be efficiently executed with MapReduce. We empirically show that the performance of our solution is suitable for real-world use cases. We present experiments on two publicly available datasets and on a synthetic dataset termed Bigflix, generated from the Netflix dataset. Bigflix contains 25 million users and more than 5 billion ratings, mimicking data sizes recently reported as Netflix' production workload. We demonstrate that our approach is able to run an iteration of Alternating Least Squares in six minutes on this dataset. Our implementation has been contributed to the open source machine learning library Apache Mahout.