Scalable I/O-bound parallel incremental gradient descent for big data analytics in GLADE

Authors:
Chengie Qin;Florin Rusu
Affiliations:
UC Merced, Merced, CA;UC Merced, Merced, CA
Venue:
Proceedings of the Second Workshop on Data Analytics in the Cloud
Year:
2013

Citing 9
Cited 0

User-defined aggregate functions: bridging theory and practice

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
The DataPath system: a data-centric analytic processing engine for large data warehouses

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
HaLoop: efficient iterative data processing on large clusters

Proceedings of the VLDB Endowment
Large-scale matrix factorization with distributed stochastic gradient descent

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
GLADE: a scalable framework for efficient analytics

ACM SIGOPS Operating Systems Review
Towards a unified architecture for in-RDBMS analytics

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
GLADE: big data analytics made easy

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
The MADlib analytics library: or MAD skills, the SQL

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Incremental gradient descent is a general technique to solve a large class of convex optimization problems arising in many machine learning tasks. GLADE is a parallel infrastructure for big data analytics providing a generic task specification interface. In this paper, we present a scalable and efficient parallel solution for incremental gradient descent in GLADE. We provide empirical evidence that our solution is limited only by the physical hardware characteristics, uses effectively the available resources, and achieves maximum scalability. When deployed in the cloud, our solution has the potential to dramatically reduce the cost of complex analytics over massive datasets.