Proceedings of the 1989 ACM/IEEE conference on Supercomputing
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Co-array Fortran for parallel programming
ACM SIGPLAN Fortran Forum
A fast Fourier transform compiler
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Type systems for distributed data structures
Proceedings of the 27th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
An annotation language for optimizing software libraries
Proceedings of the 2nd conference on Domain-specific languages
Automatically tuned linear algebra software
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
A Systolic Array Optimizing Compiler
A Systolic Array Optimizing Compiler
The Case for High-Level Parallel Programming in ZPL
IEEE Computational Science & Engineering
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
A programming system for the imagine media processor
A programming system for the imagine media processor
Programmable Stream Processors
Computer
Optimizing Compiler for the CELL Processor
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
X10: an object-oriented approach to non-uniform cluster computing
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
ClawHMMER: A Streaming HMMer-Search Implementatio
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Compiling for stream processing
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
CellSs: a programming model for the cell BE architecture
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
A portable runtime interface for multi-level memory hierarchies
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Visions for application development on hybrid computing systems
Parallel Computing
Orchestrating data transfer for the cell/B.E. processor
Proceedings of the 22nd annual international conference on Supercomputing
Orchestrating the execution of stream programs on multicore platforms
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Optimizing scientific application loops on stream processors
Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Stream Scheduling: A Framework to Manage Bulk Operations in Memory Hierarchies
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Certified Reasoning in Memory Hierarchies
APLAS '08 Proceedings of the 6th Asian Symposium on Programming Languages and Systems
CUDA-Lite: Reducing GPU Programming Complexity
Languages and Compilers for Parallel Computing
Evaluation of memory performance on the cell BE with the SARC programming model
Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
DBDB: optimizing DMATransfer for the cell be architecture
Proceedings of the 23rd international conference on Supercomputing
Tile Reduction: The First Step towards Tile Aware Parallelization in OpenMP
IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures
IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
Achieving high memory performance from heterogeneous architectures with the SARC programming model
Proceedings of the 10th workshop on MEmory performance: DEaling with Applications, systems and architecture
State-of-the-art in heterogeneous computing
Scientific Programming
Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
MapReduce for the cell broadband engine architecture
IBM Journal of Research and Development
Efficient OpenMP data mapping for multicore platforms with vertically stacked memory
Proceedings of the Conference on Design, Automation and Test in Europe
Recursion-driven parallel code generation for multi-core platforms
Proceedings of the Conference on Design, Automation and Test in Europe
Compilation of stream programs for multicore processors that incorporate scratchpad memories
Proceedings of the Conference on Design, Automation and Test in Europe
Efficient OpenMP support and extensions for MPSoCs with explicitly managed memory hierarchy
Proceedings of the Conference on Design, Automation and Test in Europe
Accelerating large-scale DEVS-based simulation on the cell processor
SpringSim '10 Proceedings of the 2010 Spring Simulation Multiconference
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Compiler-directed memory management for heterogeneous MPSoCs
Journal of Systems Architecture: the EUROMICRO Journal
Programming the memory hierarchy revisited: supporting irregular parallelism in sequoia
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Optimizing the exploitation of multicore processors and GPUs with OpenMP and OpenCL
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Exploring Multi-Grained Parallelism in Compute-Intensive DEVS Simulations
PADS '10 Proceedings of the 2010 IEEE Workshop on Principles of Advanced and Distributed Simulation
Automatic data distribution for improving data locality on the cell BE architecture
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
On-chip cache hierarchy-aware tile scheduling for multicore machines
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Parallelization strategies for the points of interests algorithm on the cell processor
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Implementing OmpSs support for regions of data in architectures with multiple address spaces
Proceedings of the 27th international ACM conference on International conference on supercomputing
An (almost) direct deployment of the Fast Multipole Method on the Cell processor
The Journal of Supercomputing
Hi-index | 0.00 |
We present a compiler for machines with an explicitly managed memory hierarchy and suggest that a primary role of any compiler for such architectures is to manipulate and schedule a hierarchy of bulk operations at varying scales of the application and of the machine. We evaluate the performance of our compiler using several benchmarks running on a Cell processor.