Distributed Matrix Completion

Authors:
Christina Teflioudi;Faraz Makari;Rainer Gemulla
Affiliations:
-;-;-
Venue:
ICDM '12 Proceedings of the 2012 IEEE 12th International Conference on Data Mining
Year:
2012

Citing 0
Cited 1

Distributed matrix factorization with mapreduce using a series of broadcast-joins

Proceedings of the 7th ACM conference on Recommender systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We discuss parallel and distributed algorithms for large-scale matrix completion on problems with millions of rows, millions of columns, and billions of revealed entries. We focus on in-memory algorithms that run on a small cluster of commodity nodes, even very large problems can be handled effectively in such a setup. Our DALS, ASGD, and DSGD++ algorithms are novel variants of the popular alternating least squares and stochastic gradient descent algorithms, they exploit thread-level parallelism, in-memory processing, and asynchronous communication. We provide some guidance on the asymptotic performance of each algorithm and investigate the performance of both our algorithms and previously proposed Map Reduce algorithms in large-scale experiments. We found that DSGD++ outperforms competing methods in terms of overall runtime, memory consumption, and scalability. Using DSGD++, we can factor a matrix with 10B entries on 16 compute nodes in around 40 minutes.