Optimizing Compiler for the CELL Processor

  • Authors:
  • Alexandre E. Eichenberger;Kathryn O'Brien;Kevin O'Brien;Peng Wu;Tong Chen;Peter H. Oden;Daniel A. Prener;Janice C. Shepherd;Byoungro So;Zehra Sura;Amy Wang;Tao Zhang;Peng Zhao;Michael Gschwind

  • Affiliations:
  • IBM T.J. Watson Research Center Yorktown Heights, New York, USA.;IBM T.J. Watson Research Center Yorktown Heights, New York, USA.;IBM T.J. Watson Research Center Yorktown Heights, New York, USA.;IBM T.J. Watson Research Center Yorktown Heights, New York, USA.;IBM T.J. Watson Research Center Yorktown Heights, New York, USA.;IBM T.J. Watson Research Center Yorktown Heights, New York, USA.;IBM T.J. Watson Research Center Yorktown Heights, New York, USA.;IBM T.J. Watson Research Center Yorktown Heights, New York, USA.;IBM T.J. Watson Research Center Yorktown Heights, New York, USA.;IBM T.J. Watson Research Center Yorktown Heights, New York, USA.;IBM T.J. Watson Research Center Yorktown Heights, New York, USA.;College of Computing Georgia Tech, USA.;IBM Toronto Laboratory Markham, Ontario, Canada.;IBM Toronto Laboratory Markham, Ontario, Canada.

  • Venue:
  • Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Developed for multimedia and game applications, as well as other numerically intensive workloads, the CELL processor provides support both for highly parallel codes, which have high computation and memory requirements, and for scalar codes, which require fast response time and a full-featured programming environment. This first generation CELL processor implements on a single chip a Power Architecture processor with two levels of cache, and eight attached streaming processors with their own local memories and globally coherent DMA engines. In addition to processor-level parallelism, each processing element has a Single Instruction Multiple Data (SIMD) unit that can process from 2 double precision floating points up to 16 bytes per instruction. This paper describes, in the context of a research prototype, several compiler techniques that aim at automatically generating high quality codes over a wide range of heterogeneous parallelism available on the CELL processor. Techniques include compiler-supported branch prediction, compiler-assisted instruction fetch, generation of scalar codes on SIMD units, automatic generation of SIMD codes, and data and code partitioning across the multiple processor elements in the system. Results indicate that significant speedup can be achieved with a high level of support from the compiler.