Accelerating computing with the cell broadband engine processor

Authors:
Catherine H. Crawford;Paul Henning;Michael Kistler;Cornell Wright
Affiliations:
IBM Corporation, Bedford, NH, USA;Los Alamos National Laboratory, Los Alamos, NM, USA;IBM Corporation, Austin, TX, USA;IBM Corporation, Austin, TX, USA
Venue:
Proceedings of the 5th conference on Computing frontiers
Year:
2008

Citing 4
Cited 21

Sparse matrix solvers on the GPU: conjugate gradients and multigrid

ACM SIGGRAPH 2003 Papers
MPI Microtask for programming the cell broadband engineTM processor

IBM Systems Journal
Introduction to the cell multiprocessor

IBM Journal of Research and Development - POWER5 and packaging
Introduction to the cell broadband engine architecture

IBM Journal of Research and Development

Entering the petaflop era: the architecture and performance of Roadrunner

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A comparison of programming models for multiprocessors with explicitly managed memory hierarchies

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Supporting MapReduce on large-scale asymmetric multi-core clusters

ACM SIGOPS Operating Systems Review
A Unified Runtime System for Heterogeneous Multi-core Architectures

Euro-Par 2008 Workshops - Parallel Processing
Hierarchical Task-Based Programming With StarSs

International Journal of High Performance Computing Applications
From Silicon to Science: The Long Road to Production Reconfigurable Supercomputing

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Exploiting the Cell/BE Architecture with the StarPU Unified Runtime System

SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Building heterogeneous reconfigurable systems with a hardware microkernel

CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Designing Accelerator-Based Distributed Systems for High Performance

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Software architecture and system validation of an open, unified model for accelerated multicore computing

IBM Journal of Research and Development
Twin peaks: a software platform for heterogeneous computing on general-purpose and graphics processors

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Accelerating large-scale DEVS-based simulation on the cell processor

SpringSim '10 Proceedings of the 2010 Spring Simulation Multiconference
Adaptive line size cache for irregular references on cell multicore processor

NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Strider: Runtime Support for Optimizing Strided Data Accesses on Multi-Cores with Explicitly Managed Memories

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Platform-aware bottleneck detection for reconfigurable computing applications

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
PTask: operating system abstractions to manage GPUs as compute devices

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Tagged procedure calls (TPC): efficient runtime support for task-based parallelism on the cell processor

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Microwave tomography for breast cancer detection on Cell broadband engine processors

Journal of Parallel and Distributed Computing
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
Dandelion: a compiler and runtime for heterogeneous systems

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we describe our approach to utilizing the compute power of the Cell Broadband Engine™ (Cell/B.E.)1 processor as an accelerator for computationally intensive portions of high performance computing applications. We call this approach "hybrid programming" because it distributes application execution across heterogeneous processors. IBM developed a hardware implementation and software infrastructure that enables this hybrid computing model as part of the Roadrunner project for Los Alamos National Laboratory (LANL). In the hybrid programming model, a process running on a host processor, such as an x86_64 architecture processor, creates an accelerator process on an accelerator processor, such as the IBM® PowerXCell™8i2. The PowerXCell8i is a new implementation of the Cell Broadband Engine architecture. The host process then schedules compute intensive operations onto the accelerator process. The host and accelerator process can continue execution concurrently and synchronize when needed to transfer results or schedule new accelerator computation. We describe the Data Communication and Synchronization (DaCS) Library and Accelerated Library Framework (ALF) which are designed to allow applications to create new applications and adapt existing applications to exploit hybrid computing platforms. We also describe our experience in using such frameworks to construct hybrid versions of the familiar Linpack benchmark and an implicit Monte Carlo radiation transport application named Milagro. Performance measurements on prototype hardware are presented that show the performance improvements achieved to date, along with projections of the expected performance on the final Roadrunner system.