Scaling up machine learning: parallel and distributed approaches

Authors:
Ron Bekkerman;Mikhail Bilenko;John Langford
Affiliations:
LinkedIn;MSR;Y! Research
Venue:
Proceedings of the 17th ACM SIGKDD International Conference Tutorials
Year:
2011

Citing 0
Cited 1

Unexpected challenges in large scale machine learning

Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

This tutorial gives a broad view of modern approaches for scaling up machine learning and data mining methods on parallel/distributed platforms. Demand for scaling up machine learning is task-specific: for some tasks it is driven by the enormous dataset sizes, for others by model complexity or by the requirement for real-time prediction. Selecting a task-appropriate parallelization platform and algorithm requires understanding their benefits, trade-offs and constraints. This tutorial focuses on providing an integrated overview of state-of-the-art platforms and algorithm choices. These span a range of hardware options (from FPGAs and GPUs to multi-core systems and commodity clusters), programming frameworks (including CUDA, MPI, MapReduce, and DryadLINQ), and learning settings (e.g., semi-supervised and online learning). The tutorial is example-driven, covering a number of popular algorithms (e.g., boosted trees, spectral clustering, belief propagation) and diverse applications (e.g., recommender systems and object recognition in vision). The tutorial is based on (but not limited to) the material from our upcoming Cambridge U. Press edited book which is currently in production. Visit the tutorial website at http://hunch.net/~large_scale_survey/