Sparse matrix solvers on the GPU: conjugate gradients and multigrid
ACM SIGGRAPH 2003 Papers
MPI Microtask for programming the cell broadband engineTM processor
IBM Systems Journal
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
Introduction to the cell broadband engine architecture
IBM Journal of Research and Development
Entering the petaflop era: the architecture and performance of Roadrunner
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A comparison of programming models for multiprocessors with explicitly managed memory hierarchies
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Supporting MapReduce on large-scale asymmetric multi-core clusters
ACM SIGOPS Operating Systems Review
A Unified Runtime System for Heterogeneous Multi-core Architectures
Euro-Par 2008 Workshops - Parallel Processing
Hierarchical Task-Based Programming With StarSs
International Journal of High Performance Computing Applications
From Silicon to Science: The Long Road to Production Reconfigurable Supercomputing
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Exploiting the Cell/BE Architecture with the StarPU Unified Runtime System
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Building heterogeneous reconfigurable systems with a hardware microkernel
CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Designing Accelerator-Based Distributed Systems for High Performance
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
IBM Journal of Research and Development
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Accelerating large-scale DEVS-based simulation on the cell processor
SpringSim '10 Proceedings of the 2010 Spring Simulation Multiconference
Adaptive line size cache for irregular references on cell multicore processor
NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Platform-aware bottleneck detection for reconfigurable computing applications
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
PTask: operating system abstractions to manage GPUs as compute devices
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Microwave tomography for breast cancer detection on Cell broadband engine processors
Journal of Parallel and Distributed Computing
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
Dandelion: a compiler and runtime for heterogeneous systems
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Hi-index | 0.00 |
In this paper, we describe our approach to utilizing the compute power of the Cell Broadband Engine™ (Cell/B.E.)1 processor as an accelerator for computationally intensive portions of high performance computing applications. We call this approach "hybrid programming" because it distributes application execution across heterogeneous processors. IBM developed a hardware implementation and software infrastructure that enables this hybrid computing model as part of the Roadrunner project for Los Alamos National Laboratory (LANL). In the hybrid programming model, a process running on a host processor, such as an x86_64 architecture processor, creates an accelerator process on an accelerator processor, such as the IBM® PowerXCell™8i2. The PowerXCell8i is a new implementation of the Cell Broadband Engine architecture. The host process then schedules compute intensive operations onto the accelerator process. The host and accelerator process can continue execution concurrently and synchronize when needed to transfer results or schedule new accelerator computation. We describe the Data Communication and Synchronization (DaCS) Library and Accelerated Library Framework (ALF) which are designed to allow applications to create new applications and adapt existing applications to exploit hybrid computing platforms. We also describe our experience in using such frameworks to construct hybrid versions of the familiar Linpack benchmark and an implicit Monte Carlo radiation transport application named Milagro. Performance measurements on prototype hardware are presented that show the performance improvements achieved to date, along with projections of the expected performance on the final Roadrunner system.