Data and computation transformations for multiprocessors
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
GPGPU: general purpose computation on graphics hardware
ACM SIGGRAPH 2004 Course Notes
A general approach for partitioning N-dimensional parallel nested loops with conditionals
Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Compiling for stream processing
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
PBPI: a high performance implementation of Bayesian phylogenetic inference
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
CellSs: a programming model for the cell BE architecture
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Performance evaluation of GPUs using the RapidMind development platform
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
GPGPU: general-purpose computation on graphics hardware
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Compilation for explicitly managed memory hierarchies
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Cell broadband engine architecture and its first implementation: a performance view
IBM Journal of Research and Development
CellSs: making it easier to program the cell broadband engine processor
IBM Journal of Research and Development
Scientific computing Kernels on the cell processor
International Journal of Parallel Programming
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
A portable runtime interface for multi-level memory hierarchies
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Accelerating computing with the cell broadband engine processor
Proceedings of the 5th conference on Computing frontiers
Orchestrating data transfer for the cell/B.E. processor
Proceedings of the 22nd annual international conference on Supercomputing
International Journal of Parallel Programming
Programming the Cell Processor: For Games, Graphics, and Computation
Programming the Cell Processor: For Games, Graphics, and Computation
A comparison of programming models for multiprocessors with explicitly managed memory hierarchies
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
CellSs: Scheduling techniques to better exploit memory hierarchy
Scientific Programming - High Performance Computing with the Cell Broadband Engine
DBDB: optimizing DMATransfer for the cell be architecture
Proceedings of the 23rd international conference on Supercomputing
Optimizing assignment of threads to SPEs on the cell BE processor
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Multi-core acceleration of chemical kinetics for simulation and prediction
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Modeling multigrain parallelism on heterogeneous multi-core processors: a case study of the cell BE
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
On-chip communication and synchronization mechanisms with cache-integrated network interfaces
Proceedings of the 7th ACM international conference on Computing frontiers
A ROSE-Based OpenMP 3.0 research compiler supporting multiple runtime libraries
IWOMP'10 Proceedings of the 6th international conference on Beyond Loop Level Parallelism in OpenMP: accelerators, Tasking and more
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Analysis of task offloading for accelerators
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Hi-index | 0.00 |
Multi-core processors with explicitly-managed local memories provide advanced capabilities to optimize data caching and prefetching in software. Unfortunately, these capabilities are neither easily accessible to programmers, nor exploited to their maximum potential by current language, compiler, or runtime frameworks. We present Strider, a runtime framework for optimizing compilers on multi-core processors with software- managed memories. Strider transparently optimizes grouping, decomposition, and scheduling of explicit software-managed accesses to multi-dimensional arrays in nested loops, given a high- level specification of loops and their data access patterns. In particular, Strider contributes new methods to improve temporal locality, optimize the critical path of scheduling data transfers for multi-stride accesses in regular nested parallel loops, and distribute accesses between cores. The prototype of Strider on the IBM Cell processor performs competitively to hand-optimized code and better than contemporary language frameworks, in both non-trivial parallel applications and important application kernels.