Compilation for explicitly managed memory hierarchies

Authors:
Timothy J. Knight;Ji Young Park;Manman Ren;Mike Houston;Mattan Erez;Kayvon Fatahalian;Alex Aiken;William J. Dally;Pat Hanrahan
Affiliations:
Stanford University, Stanford, CA;Stanford University, Stanford, CA;Stanford University, Stanford, CA;Stanford University, Stanford, CA;Stanford University, Stanford, CA;Stanford University, Stanford, CA;Stanford University, Stanford, CA;Stanford University, Stanford, CA;Stanford University, Stanford, CA
Venue:
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Year:
2007

Citing 20
Cited 33

More iteration space tiling

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Software prefetching

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Co-array Fortran for parallel programming

ACM SIGPLAN Fortran Forum
A fast Fourier transform compiler

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Type systems for distributed data structures

Proceedings of the 27th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
An annotation language for optimizing software libraries

Proceedings of the 2nd conference on Domain-specific languages
Automatically tuned linear algebra software

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
A Systolic Array Optimizing Compiler

A Systolic Array Optimizing Compiler
The Case for High-Level Parallel Programming in ZPL

IEEE Computational Science & Engineering
Cache-Oblivious Algorithms

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
A programming system for the imagine media processor

A programming system for the imagine media processor
Programmable Stream Processors

Computer
Optimizing Compiler for the CELL Processor

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
X10: an object-oriented approach to non-uniform cluster computing

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
ClawHMMER: A Streaming HMMer-Search Implementatio

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Using advanced compiler technology to exploit the performance of the Cell Broadband EngineTM architecture

IBM Systems Journal
Compiling for stream processing

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
CellSs: a programming model for the cell BE architecture

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing

Parallelization schemes for memory optimization on the cell processor: a case study of image processing algorithm

MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
A portable runtime interface for multi-level memory hierarchies

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Visions for application development on hybrid computing systems

Parallel Computing
Orchestrating data transfer for the cell/B.E. processor

Proceedings of the 22nd annual international conference on Supercomputing
Orchestrating the execution of stream programs on multicore platforms

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Optimizing scientific application loops on stream processors

Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Stream Scheduling: A Framework to Manage Bulk Operations in Memory Hierarchies

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Certified Reasoning in Memory Hierarchies

APLAS '08 Proceedings of the 6th Asian Symposium on Programming Languages and Systems
CUDA-Lite: Reducing GPU Programming Complexity

Languages and Compilers for Parallel Computing
Evaluation of memory performance on the cell BE with the SARC programming model

Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
DBDB: optimizing DMATransfer for the cell be architecture

Proceedings of the 23rd international conference on Supercomputing
Tile Reduction: The First Step towards Tile Aware Parallelization in OpenMP

IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures

IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
Achieving high memory performance from heterogeneous architectures with the SARC programming model

Proceedings of the 10th workshop on MEmory performance: DEaling with Applications, systems and architecture
State-of-the-art in heterogeneous computing

Scientific Programming
Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
MapReduce for the cell broadband engine architecture

IBM Journal of Research and Development
Efficient OpenMP data mapping for multicore platforms with vertically stacked memory

Proceedings of the Conference on Design, Automation and Test in Europe
Recursion-driven parallel code generation for multi-core platforms

Proceedings of the Conference on Design, Automation and Test in Europe
Compilation of stream programs for multicore processors that incorporate scratchpad memories

Proceedings of the Conference on Design, Automation and Test in Europe
Efficient OpenMP support and extensions for MPSoCs with explicitly managed memory hierarchy

Proceedings of the Conference on Design, Automation and Test in Europe
Accelerating large-scale DEVS-based simulation on the cell processor

SpringSim '10 Proceedings of the 2010 Spring Simulation Multiconference
Strider: Runtime Support for Optimizing Strided Data Accesses on Multi-Cores with Explicitly Managed Memories

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Compiler-directed memory management for heterogeneous MPSoCs

Journal of Systems Architecture: the EUROMICRO Journal
Programming the memory hierarchy revisited: supporting irregular parallelism in sequoia

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Optimizing the exploitation of multicore processors and GPUs with OpenMP and OpenCL

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Exploring Multi-Grained Parallelism in Compute-Intensive DEVS Simulations

PADS '10 Proceedings of the 2010 IEEE Workshop on Principles of Advanced and Distributed Simulation
Automatic data distribution for improving data locality on the cell BE architecture

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
On-chip cache hierarchy-aware tile scheduling for multicore machines

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Multicore acceleration of Discrete Event System Specification systems

Simulation
Parallelization strategies for the points of interests algorithm on the cell processor

ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Implementing OmpSs support for regions of data in architectures with multiple address spaces

Proceedings of the 27th international ACM conference on International conference on supercomputing
An (almost) direct deployment of the Fast Multipole Method on the Cell processor

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a compiler for machines with an explicitly managed memory hierarchy and suggest that a primary role of any compiler for such architectures is to manipulate and schedule a hierarchy of bulk operations at varying scales of the application and of the machine. We evaluate the performance of our compiler using several benchmarks running on a Cell processor.