Hardware and software architectures for the CELL processor

  • Authors:
  • Peter Hofstee;Michael Day

  • Affiliations:
  • IBM Systems & Technology Group, Austin, TX;IBM Systems & Technology Group, Austin, TX

  • Venue:
  • CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Cell processor is a first instance of a new family of processors intended for the broadband era. The processors will find early use in game systems (PlayStation3TM), a variety of other consumer electronics applications, a wide variety of embedded applications, and various forms of computational accelerators. Cell is a non-homogeneous multi-core processor, with one POWER processor core (two threads) dedicated to the operating system and other control functions, and eight synergistic processors optimized for compute-intensive applications.Cell addresses two of the main limiters to microprocessor performance: increased memory latency, and performance limitations induced by system power limits. Memory latency is addressed by introducing another software-managed level of private "local" memory, in between the private registers and shared system memory. Data is transferred between this local memory and shared memory with asynchronous cache coherent DMA commands, and synergistic processor load and store commands access the local store only. This organization of memory makes it possible for the Cell processor to have over 100 memory transactions in flight at the same time, more than enough to cover memory latency. Power limitations are addressed by two main mechanisms; a non-homogeneous multi-core organization, and an ultra high-frequency design that allows the chip to be operated at 3.2GHz at low voltage.The Cell processor supports many of today's programming models by introducing the concept of heterogeneous tasks or threads. Both Power processor and SPE based threads can be managed by the operating system and effectively utilized by applications starting with the relatively straightforward function offload model to the more complex single source heterogeneous parallel programming model. Cell achieves between one and two orders of magnitude of performance advantage over conventional single-core processors on compute-intensive (32-bit) applications, by permitting programmers and compilers explicit control over instruction scheduling, data movement and the use of a large register file.