Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
Optimizing pipelines for power and performance
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Evaluation of a Multithreaded Architecture for Cellular Computing
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Microarchitecture Optimizations for Exploiting Memory-Level Parallelism
Proceedings of the 31st annual international symposium on Computer architecture
Blue Gene: a vision for protein science using a petaflop supercomputer
IBM Systems Journal - Deep computing for the life sciences
Power Efficient Processor Architecture and The Cell Processor
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Power and performance optimization at the system level
Proceedings of the 2nd conference on Computing frontiers
Optimizing Compiler for the CELL Processor
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Chip multiprocessing and the cell broadband engine
Proceedings of the 3rd conference on Computing frontiers
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
Cell GC: using the cell synergistic processor as a garbage collection coprocessor
Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Entering the petaflop era: the architecture and performance of Roadrunner
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Radioastronomy Image Synthesis on the Cell/B.E.
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
SPENK: adding another level of parallelism on the cell broadband engine
IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
Scientific Programming - High Performance Computing with the Cell Broadband Engine
Fractal terrain generation for SIMD architectures
International Journal of Computer Applications in Technology
Global Principal Typing in Partially Commutative Asynchronous Sessions
ESOP '09 Proceedings of the 18th European Symposium on Programming Languages and Systems: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
Stream compaction for deferred shading
Proceedings of the Conference on High Performance Graphics 2009
Experiences with Cell-BE and GPU for Tomography
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
The multikernel: a new OS architecture for scalable multicore systems
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Bias scheduling in heterogeneous multi-core architectures
Proceedings of the 5th European conference on Computer systems
Accelerating 3D nonrigid registration using the cell broadband engine processor
IBM Journal of Research and Development
An orthogonal matching pursuit algorithm for image denoising on the cell broadband engine
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Real-Time Adaptive Background Modeling for Multicore Embedded Systems
Journal of Signal Processing Systems
Bridging functional heterogeneity in multicore architectures
ACM SIGOPS Operating Systems Review
Static bus schedule aware scratchpad allocation in multiprocessors
Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Improving programmability of heterogeneous many-core systems via explicit platform descriptions
Proceedings of the 4th International Workshop on Multicore Software Engineering
Optimizing explicit data transfers for data parallel applications on the cell architecture
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Parallelization of Belief Propagation on Cell Processors for Stereo Vision
ACM Transactions on Embedded Computing Systems (TECS)
Computers and Electrical Engineering
Journal of Systems Architecture: the EUROMICRO Journal
Optimizing two-dimensional DMA transfers for scratchpad Based MPSoCs platforms
Microprocessors & Microsystems
Configurable range memory for effective data reuse on programmable accelerators
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Hi-index | 0.00 |
As CMOS feature sizes continue to shrink and traditional microarchitectural methods for delivering high performance (e.g., deep pipelining) become too expensive and power-hungry, chip multiprocessors (CMPs) become an exciting new direction by which system designers can deliver increased performance. Exploiting parallelism in such designs is the key to high performance, and we find that parallelism must be exploited at multiple levels of the system: the thread-level parallelism that has become popular in many designs fails to exploit all the levels of available parallelism in many workloads for CMP systems. We describe the Cell Broadband Engine and the multiple levels at which its architecture exploits parallelism: data-level, instruction-level, thread-level, memory-level, and compute-transfer parallelism. By taking advantage of opportunities at all levels of the system, this CMP revolutionizes parallel architectures to deliver previously unattained levels of single chip performance. We describe how the heterogeneous cores allow to achieve this performance by parallelizing and offloading computation intensive application code onto the Synergistic Processor Element (SPE) cores using a heterogeneous thread model with SPEs. We also give an example of scheduling code to be memory latency tolerant using software pipelining techniques in the SPE.