High performance combinatorial algorithm design on the Cell Broadband Engine processor

Authors:
David A. Bader;Virat Agarwal;Kamesh Madduri;Seunghwa Kang
Affiliations:
Georgia Institute of Technology, Atlanta, GA 30332, United States;Georgia Institute of Technology, Atlanta, GA 30332, United States;Georgia Institute of Technology, Atlanta, GA 30332, United States;Georgia Institute of Technology, Atlanta, GA 30332, United States
Venue:
Parallel Computing
Year:
2007

Citing 13
Cited 13

Faster optimal parallel prefix sums and list ranking

Information and Computation
A bridging model for parallel computation

Communications of the ACM
An introduction to parallel algorithms

An introduction to parallel algorithms
LogP: towards a realistic model of parallel computation

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
List ranking and list scan on the Cray C90

Journal of Computer and System Sciences
Prefix computations on symmetric multiprocessors

Journal of Parallel and Distributed Computing
Evaluating Arithmetic Expressions Using Tree Contraction: A Fast and Scalable Parallel Implementation for Symmetric Multiprocessors (SMPs) (Extended Abstract)

HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
The Vector Floating-Point Unit in a Synergistic Processor Element of a CELL Processor

ARITH '05 Proceedings of the 17th IEEE Symposium on Computer Arithmetic
On the Architectural Requirements for Efficient Execution of Graph Algorithms

ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
The potential of the cell processor for scientific computing

Proceedings of the 3rd conference on Computing frontiers
Introduction to the cell multiprocessor

IBM Journal of Research and Development - POWER5 and packaging
Cell Multiprocessor Communication Network: Built for Speed

IEEE Micro
A universal algorithm for sequential data compression

IEEE Transactions on Information Theory

Parallel exact inference on the cell broadband engine processor

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Hardware-accelerated components for hybrid computing systems

Proceedings of the 2008 compFrame/HPC-GECO workshop on Component based high performance
SPENK: adding another level of parallelism on the cell broadband engine

IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
Building high-resolution sky images using the Cell/B.E.

Scientific Programming - High Performance Computing with the Cell Broadband Engine
Evaluating multi-core platforms for HPC data-intensive kernels

Proceedings of the 6th ACM conference on Computing frontiers
Off-loading compute intensive tasks for insurance products using a just-in-time compiler on a hybrid system

CASCON '09 Proceedings of the 2009 Conference of the Center for Advanced Studies on Collaborative Research
Parallel exact inference on the Cell Broadband Engine processor

Journal of Parallel and Distributed Computing
Acceleration of hierarchical Bayesian network based cortical models on multicore architectures

Parallel Computing
Extending the cell SPE with energy efficient branch prediction

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Modeling and Evaluating Non-shared Memory CELL/BE Type Multi-core Architectures for Local Image and Video Processing

Journal of Signal Processing Systems
Making the Best of Temporal Locality: Just-in-Time Renaming and Lazy Write-Back on the Cell/B.E

International Journal of High Performance Computing Applications
Microwave tomography for breast cancer detection on Cell broadband engine processors

Journal of Parallel and Distributed Computing
Ultrasound simulation on the cell broadband engine using the westervelt equation

ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Sony-Toshiba-IBM Cell Broadband Engine (Cell/B.E.) is a heterogeneous multicore architecture that consists of a traditional microprocessor (PPE) with eight SIMD co-processing units (SPEs) integrated on-chip. While the Cell/B.E. processor is architected for multimedia applications with regular processing requirements, we are interested in its performance on problems with non-uniform memory access patterns. In this article, we present two case studies to illustrate the design and implementation of parallel combinatorial algorithms on Cell/B.E.: we discuss list ranking, a fundamental kernel for graph problems, and zlib, a data compression and decompression library. List ranking is a particularly challenging problem to parallelize on current cache-based and distributed memory architectures due to its low computational intensity and irregular memory access patterns. To tolerate memory latency on the Cell/B.E. processor, we decompose work into several independent tasks and coordinate computation using the novel idea of Software-Managed threads (SM-Threads). We apply this generic SPE work-partitioning technique to efficiently implement list ranking, and demonstrate substantial speedup in comparison to traditional cache-based microprocessors. For instance, on a 3.2GHz IBM QS20 Cell/B.E. blade, for a random linked list of 1 million nodes, we achieve an overall speedup of 8.34 over a PPE-only implementation. Our second case study, zlib, is a data compression/decompression library that is extensively used in both scientific as well as general purpose computing. The core kernels in the zlib library are the LZ77 longest subsequence matching algorithm and Huffman data encoding. We design efficient parallel algorithms for these combinatorial kernels, and exploit concurrency at multiple levels on the Cell/B.E. processor. We also present a Cell/B.E. optimized implementation of gzip, a popular file-compression application based on the zlib library. For our Cell/B.E. implementation of gzip, we achieve an average speedup of 2.9 in compression over current workstations.