Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Linear scan register allocation
ACM Transactions on Programming Languages and Systems (TOPLAS)
Dynamo: a transparent dynamic optimization system
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Compilation and delayed evaluation in APL
POPL '78 Proceedings of the 5th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Tentative compilation: A design for an APL compiler
APL '79 Proceedings of the international conference on APL: part 1
An apl machine
HotpathVM: an effective JIT compiler for resource-constrained devices
Proceedings of the 2nd international conference on Virtual execution environments
Compiling for stream processing
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Interpreting the data: Parallel analysis with Sawzall
Scientific Programming - Dynamic Grids and Worldwide Computing
Stream fusion: from lists to streams to nothing at all
ICFP '07 Proceedings of the 12th ACM SIGPLAN international conference on Functional programming
Larrabee: a many-core x86 architecture for visual computing
ACM SIGGRAPH 2008 papers
Harnessing the Multicores: Nested Data Parallelism in Haskell
APLAS '08 Proceedings of the 6th Asian Symposium on Programming Languages and Systems
Factored operating systems (fos): the case for a scalable operating system for multicores
ACM SIGOPS Operating Systems Review
Trace-based just-in-time type specialization for dynamic languages
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Lazy binary-splitting: a run-time adaptive work-stealing scheduler
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Code analysis and parallelizing vector operations in R
Computational Statistics - Proceedings of DSC 2007
Regular, shape-polymorphic, parallel arrays in Haskell
Proceedings of the 15th ACM SIGPLAN international conference on Functional programming
Inline caching meets quickening
ECOOP'10 Proceedings of the 24th European conference on Object-oriented programming
Copperhead: compiling an embedded data parallel language
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
McFLAT: a profile-based framework for MATLAB loop analysis and transformations
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Staged static techniques to efficiently implement array copy semantics in a MATLAB JIT compiler
CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
Scalable aggregation on multicore processors
Proceedings of the Seventh International Workshop on Data Management on New Hardware
Optimizing MATLAB through just-in-time specialization
CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
Intel's Array Building Blocks: A retargetable, dynamic compiler and embedded language
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Evaluating the design of the R language: objects and functions for data analysis
ECOOP'12 Proceedings of the 26th European conference on Object-Oriented Programming
Optimizing R VM: Allocation Removal and Path Length Reduction via Interpreter-level Specialization
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
A fast abstract syntax tree interpreter for R
Proceedings of the 10th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Hi-index | 0.00 |
There is a growing utilization gap between modern hardware and modern programming languages for data analysis.Due to power and other constraints, recent processor design has sought improved performance through increased SIMD and multi-core parallelism. At the same time, high-level, dynamically-typed languages for data analysis have become popular. These languages emphasize ease of use and high productivity, but have, in general, low performance and limited support for exploiting hardware parallelism. In this paper, we describe Riposte, a new runtime for the R language, which bridges this gap. Riposte uses tracing, a technique commonly used to accelerate scalar code, to dynamically discover and extract sequences of vector operations from arbitrary R code. Once extracted, we can fuse traces to eliminate unnecessary memory traffic, compile them to use hardware SIMD units, and schedule them to run across multiple cores, allowing us to fully utilize the available parallelism on modern shared-memory machines. Our evaluation shows that Riposte can run vector R code near the speed of hand-optimized C, 5--50x faster than the open source implementation of R, and can also linearly scale to 32 cores for some tasks. Across 12 different workloads we achieve an overall average speed-up of over 150x without explicit programmer parallelization.