Large-scale matrix factorization with distributed stochastic gradient descent

Authors:
Rainer Gemulla;Erik Nijkamp;Peter J. Haas;Yannis Sismanis
Affiliations:
Max-Planck-Institut für Informatik, Saarbrücken, Germany;IBM Almaden Research Center, San Jose, CA, USA;IBM Almaden Research Center, San Jose, CA, USA;IBM Almaden Research Center, San Jose, CA, USA
Venue:
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2011

Citing 10
Cited 22

A limited memory algorithm for bound constrained optimization

SIAM Journal on Scientific Computing
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Google news personalization: scalable online collaborative filtering

Proceedings of the 16th international conference on World Wide Web
Large-Scale Parallel Collaborative Filtering for the Netflix Prize

AAIM '08 Proceedings of the 4th international conference on Algorithmic Aspects in Information and Management
A Unified View of Matrix Factorization Models

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Matrix Factorization Techniques for Recommender Systems

Computer
Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce

Proceedings of the 19th international conference on World wide web
Ricardo: integrating R and Hadoop

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Distributed training strategies for the structured perceptron

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

Pushing the boundaries of crowd-enabled databases with query-driven schema expansion

Proceedings of the VLDB Endowment
Towards a unified architecture for in-RDBMS analytics

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Unexpected challenges in large scale machine learning

Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Scalable similarity-based neighborhood methods with MapReduce

Proceedings of the sixth ACM conference on Recommender systems
Discovering latent factors from movies genres for enhanced recommendation

Proceedings of the sixth ACM conference on Recommender systems
Sparkler: supporting large-scale matrix factorization

Proceedings of the 16th International Conference on Extending Database Technology
Big graph mining: algorithms and discoveries

ACM SIGKDD Explorations Newsletter
A general collaborative filtering framework based on matrix bordered block diagonal forms

Proceedings of the 24th ACM Conference on Hypertext and Social Media
Improve collaborative filtering through bordered block diagonal form matrices

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Scalable I/O-bound parallel incremental gradient descent for big data analytics in GLADE

Proceedings of the Second Workshop on Data Analytics in the Cloud
FISM: factored item similarity models for top-N recommender systems

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Distributed large-scale natural graph factorization

Proceedings of the 22nd international conference on World Wide Web
SoCo: a social network aided context-aware recommender system

Proceedings of the 22nd international conference on World Wide Web
Localized matrix factorization for recommendation based on matrix block diagonal forms

Proceedings of the 22nd international conference on World Wide Web
"All roads lead to Rome": optimistic recovery for distributed iterative data processing

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
A fast parallel SGD for matrix factorization in shared memory systems

Proceedings of the 7th ACM conference on Recommender systems
Distributed matrix factorization with mapreduce using a series of broadcast-joins

Proceedings of the 7th ACM conference on Recommender systems
Scalable mining of social data using stochastic gradient fisher scoring

Proceedings of the 2013 workshop on Data-driven user behavioral modelling and mining from social media
Pessimists and optimists: Improving collaborative filtering through sentiment analysis

Expert Systems with Applications: An International Journal
iGSLR: personalized geo-social location recommendation: a kernel density estimation approach

Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
A distributed algorithm for large-scale generalized matching

Proceedings of the VLDB Endowment
A platform for eXtreme analytics

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

We provide a novel algorithm to approximately factor large matrices with millions of rows, millions of columns, and billions of nonzero elements. Our approach rests on stochastic gradient descent (SGD), an iterative stochastic optimization algorithm. We first develop a novel "stratified" SGD variant (SSGD) that applies to general loss-minimization problems in which the loss function can be expressed as a weighted sum of "stratum losses." We establish sufficient conditions for convergence of SSGD using results from stochastic approximation theory and regenerative process theory. We then specialize SSGD to obtain a new matrix-factorization algorithm, called DSGD, that can be fully distributed and run on web-scale datasets using, e.g., MapReduce. DSGD can handle a wide variety of matrix factorizations. We describe the practical techniques used to optimize performance in our DSGD implementation. Experiments suggest that DSGD converges significantly faster and has better scalability properties than alternative algorithms.