Riposte: a trace-driven compiler and parallel VM for vector code in R

Authors:
Justin Talbot;Zachary DeVito;Pat Hanrahan
Affiliations:
Stanford University, Stanford, CA, USA;Stanford University, Stanford, CA, USA;Stanford, Stanford, CA, USA
Venue:
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Year:
2012

Citing 25
Cited 2

More iteration space tiling

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Linear scan register allocation

ACM Transactions on Programming Languages and Systems (TOPLAS)
Dynamo: a transparent dynamic optimization system

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Compilation and delayed evaluation in APL

POPL '78 Proceedings of the 5th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Tentative compilation: A design for an APL compiler

APL '79 Proceedings of the international conference on APL: part 1
An apl machine

An apl machine
HotpathVM: an effective JIT compiler for resource-constrained devices

Proceedings of the 2nd international conference on Virtual execution environments
Compiling for stream processing

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Interpreting the data: Parallel analysis with Sawzall

Scientific Programming - Dynamic Grids and Worldwide Computing
Stream fusion: from lists to streams to nothing at all

ICFP '07 Proceedings of the 12th ACM SIGPLAN international conference on Functional programming
Larrabee: a many-core x86 architecture for visual computing

ACM SIGGRAPH 2008 papers
Harnessing the Multicores: Nested Data Parallelism in Haskell

APLAS '08 Proceedings of the 6th Asian Symposium on Programming Languages and Systems
Factored operating systems (fos): the case for a scalable operating system for multicores

ACM SIGOPS Operating Systems Review
Trace-based just-in-time type specialization for dynamic languages

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Lazy binary-splitting: a run-time adaptive work-stealing scheduler

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Code analysis and parallelizing vector operations in R

Computational Statistics - Proceedings of DSC 2007
Regular, shape-polymorphic, parallel arrays in Haskell

Proceedings of the 15th ACM SIGPLAN international conference on Functional programming
Inline caching meets quickening

ECOOP'10 Proceedings of the 24th European conference on Object-oriented programming
Copperhead: compiling an embedded data parallel language

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
McFLAT: a profile-based framework for MATLAB loop analysis and transformations

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Staged static techniques to efficiently implement array copy semantics in a MATLAB JIT compiler

CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
Scalable aggregation on multicore processors

Proceedings of the Seventh International Workshop on Data Management on New Hardware
Optimizing MATLAB through just-in-time specialization

CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
Intel's Array Building Blocks: A retargetable, dynamic compiler and embedded language

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Evaluating the design of the R language: objects and functions for data analysis

ECOOP'12 Proceedings of the 26th European conference on Object-Oriented Programming

Optimizing R VM: Allocation Removal and Path Length Reduction via Interpreter-level Specialization

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
A fast abstract syntax tree interpreter for R

Proceedings of the 10th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments

Quantified Score

Hi-index	0.00

Visualization

Abstract

There is a growing utilization gap between modern hardware and modern programming languages for data analysis.Due to power and other constraints, recent processor design has sought improved performance through increased SIMD and multi-core parallelism. At the same time, high-level, dynamically-typed languages for data analysis have become popular. These languages emphasize ease of use and high productivity, but have, in general, low performance and limited support for exploiting hardware parallelism. In this paper, we describe Riposte, a new runtime for the R language, which bridges this gap. Riposte uses tracing, a technique commonly used to accelerate scalar code, to dynamically discover and extract sequences of vector operations from arbitrary R code. Once extracted, we can fuse traces to eliminate unnecessary memory traffic, compile them to use hardware SIMD units, and schedule them to run across multiple cores, allowing us to fully utilize the available parallelism on modern shared-memory machines. Our evaluation shows that Riposte can run vector R code near the speed of hand-optimized C, 5--50x faster than the open source implementation of R, and can also linearly scale to 32 cores for some tasks. Across 12 different workloads we achieve an overall average speed-up of over 150x without explicit programmer parallelization.