Learning-based entity resolution with MapReduce
Proceedings of the third international workshop on Cloud data management
Matrix chain multiplication via multi-way join algorithms in MapReduce
Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication
Large-scale machine learning at twitter
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Large-scale distributed non-negative sparse coding and sparse dictionary learning
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
The MADlib analytics library: or MAD skills, the SQL
Proceedings of the VLDB Endowment
M3R: increased performance for in-memory Hadoop jobs
Proceedings of the VLDB Endowment
Sparkler: supporting large-scale matrix factorization
Proceedings of the 16th International Conference on Extending Database Technology
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Cumulon: optimizing statistical data analysis in the cloud
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Simulation of database-valued markov chains using SimSQL
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Scaling big data mining infrastructure: the twitter experience
ACM SIGKDD Explorations Newsletter
Big graph mining: algorithms and discoveries
ACM SIGKDD Explorations Newsletter
Upper and lower bounds on the cost of a map-reduce computation
Proceedings of the VLDB Endowment
Distributed data management using MapReduce
ACM Computing Surveys (CSUR)
Distributed matrix factorization with mapreduce using a series of broadcast-joins
Proceedings of the 7th ACM conference on Recommender systems
The family of mapreduce and large-scale data processing systems
ACM Computing Surveys (CSUR)
CG_Hadoop: computational geometry in MapReduce
Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
Compiling machine learning algorithms with SystemML
Proceedings of the 4th annual Symposium on Cloud Computing
Next generation data analytics at IBM research
Proceedings of the VLDB Endowment
A demonstration of SpatialHadoop: an efficient mapreduce framework for spatial data
Proceedings of the VLDB Endowment
Speeding-up codon analysis on the cloud with local MapReduce aggregation
Information Sciences: an International Journal
Exploiting inter-operation parallelism for matrix chain multiplication using MapReduce
The Journal of Supercomputing
Understanding system design for big data workloads
IBM Journal of Research and Development
A platform for eXtreme analytics
IBM Journal of Research and Development
Hi-index | 0.00 |
MapReduce is emerging as a generic parallel programming paradigm for large clusters of machines. This trend combined with the growing need to run machine learning (ML) algorithms on massive datasets has led to an increased interest in implementing ML algorithms on MapReduce. However, the cost of implementing a large class of ML algorithms as low-level MapReduce jobs on varying data and machine cluster sizes can be prohibitive. In this paper, we propose SystemML in which ML algorithms are expressed in a higher-level language and are compiled and executed in a MapReduce environment. This higher-level language exposes several constructs including linear algebra primitives that constitute key building blocks for a broad class of supervised and unsupervised ML algorithms. The algorithms expressed in SystemML are compiled and optimized into a set of MapReduce jobs that can run on a cluster of machines. We describe and empirically evaluate a number of optimization strategies for efficiently executing these algorithms on Hadoop, an open-source MapReduce implementation. We report an extensive performance evaluation on three ML algorithms on varying data and cluster sizes.