Improving superword level parallelism support in modern compilers
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Fast and fair: data-stream quality of service
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Stream Programming on General-Purpose Processors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Performance characteristics of MAUI: an intelligent memory system architecture
Proceedings of the 2005 workshop on Memory system performance
Data and Computation Transformations for Brook Streaming Applications on Multiprocessors
Proceedings of the International Symposium on Code Generation and Optimization
Chip multiprocessing and the cell broadband engine
Proceedings of the 3rd conference on Computing frontiers
Optimizing compiler for shared-memory multiple SIMD architecture
Proceedings of the 2006 ACM SIGPLAN/SIGBED conference on Language, compilers, and tool support for embedded systems
Interconnect-Aware Coherence Protocols for Chip Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
MPI Microtask for programming the cell broadband engineTM processor
IBM Systems Journal
Systems research challenges: a scale-out perspective
IBM Journal of Research and Development
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
Avoiding conversion and rearrangement overhead in SIMD architectures
International Journal of Parallel Programming
Stall cycle redistribution in a transparent fetch pipeline
Proceedings of the 2006 international symposium on Low power electronics and design
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
A defect tolerant self-organizing nanoscale SIMD architecture
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Integrated scratchpad memory optimization and task scheduling for MPSoC architectures
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Enabling real-time physics simulation in future interactive entertainment
Proceedings of the 2006 ACM SIGGRAPH symposium on Videogames
Physical aware frequency selection for dynamic thermal management in multi-core systems
Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design
Rotary router: an efficient architecture for CMP interconnection networks
Proceedings of the 34th annual international symposium on Computer architecture
ParallAX: an architecture for real-time physics
Proceedings of the 34th annual international symposium on Computer architecture
On Characterizing Performance of the Cell Broadband Engine Element Interconnect Bus
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
A self-organizing defect tolerant SIMD architecture
ACM Journal on Emerging Technologies in Computing Systems (JETC)
Microprocessors in the era of terascale integration
Proceedings of the conference on Design, automation and test in Europe
Executing irregular scientific applications on stream architectures
Proceedings of the 21st annual international conference on Supercomputing
Interconnects in the third dimension: design challenges for 3D ICs
Proceedings of the 44th annual Design Automation Conference
JTRES '07 Proceedings of the 5th international workshop on Java technologies for real-time and embedded systems
CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
A New Era of Performance Evaluation
Computer
The cell broadband engine: exploiting multiple levels of parallelism in a chip multiprocessor
International Journal of Parallel Programming
Streamware: programming general-purpose multicore processors using streams
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
The worst-case execution-time problem—overview of methods and survey of tools
ACM Transactions on Embedded Computing Systems (TECS)
Hierarchical memory system design for a heterogeneous multi-core processor
Proceedings of the 2008 ACM symposium on Applied computing
Dma-based prefetching for i/o-intensive workloads on the cell architecture
Proceedings of the 5th conference on Computing frontiers
Fpga-based prototype of a pram-on-chip processor
Proceedings of the 5th conference on Computing frontiers
A modular 3d processor for flexible product design and technology migration
Proceedings of the 5th conference on Computing frontiers
Asynchronous control of modules activity in integrated systems for reducing peak temperatures
Integration, the VLSI Journal
A low-power cache scheme for embedded computing
Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
RC-SIMD: Reconfigurable communication SIMD architecture for image processing applications
Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
Orchestrating the execution of stream programs on multicore platforms
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
A lightweight streaming layer for multicore execution
ACM SIGARCH Computer Architecture News
Exact and approximate task assignment algorithms for pipelined software synthesis
Proceedings of the conference on Design, automation and test in Europe
Radioastronomy Image Synthesis on the Cell/B.E.
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Automatic Pre-Fetch and Modulo Scheduling Transformations for the Cell BE Architecture
Languages and Compilers for Parallel Computing
Deriving Efficient Data Movement from Decoupled Access/Execute Specifications
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Parallel LDPC Decoding on the Cell/B.E. Processor
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Available task-level parallelism on the Cell BE
Scientific Programming - High Performance Computing with the Cell Broadband Engine
GViM: GPU-accelerated virtual machines
Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing
Compile-Time and Run-Time Issues in an Auto-Parallelisation System for the Cell BE Processor
Euro-Par 2008 Workshops - Parallel Processing
High-performance regular expression scanning on the Cell/B.E. processor
Proceedings of the 23rd international conference on Supercomputing
Efficient high performance collective communication for the cell blade
Proceedings of the 23rd international conference on Supercomputing
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Time-predictable computer architecture
EURASIP Journal on Embedded Systems - FPGA supercomputing platforms, architectures, and techniques for accelerating computationally complex algorithms
Leakage-Aware Multiprocessor Scheduling
Journal of Signal Processing Systems
Electronic Notes in Theoretical Computer Science (ENTCS)
Journal of Signal Processing Systems
Mapping stream programs onto heterogeneous multiprocessor systems
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Efficient program scheduling for heterogeneous multi-core processors
Proceedings of the 46th Annual Design Automation Conference
Allocation wall: a limiting factor of Java applications on emerging multi-core platforms
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Fool me twice: Exploring and exploiting error tolerance in physics-based animation
ACM Transactions on Graphics (TOG)
An adaptative game loop architecture with automatic distribution of tasks between CPU and GPU
Computers in Entertainment (CIE) - SPECIAL ISSUE: Games
IEEE Transactions on Circuits and Systems for Video Technology
TRaX: a multicore hardware architecture for real-time ray tracing
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Design and implementation of a graphical user interface for stream-based distributed computing
PDCN '08 Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks
An analytical model to exploit memory task scheduling
Proceedings of the 2010 Workshop on Interaction between Compilers and Computer Architecture
Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
Data pipeline optimization for shared memory multiple-SIMD architecture
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Minimizing communication in rate-optimal software pipelining for stream programs
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Trade-offs between voltage scaling and processor shutdown for low-energy embedded multiprocessors
SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
Real-time motion tracking using the CELL BE
NTMS'09 Proceedings of the 3rd international conference on New technologies, mobility and security
New challenges of parallel job scheduling
JSSPP'07 Proceedings of the 13th international conference on Job scheduling strategies for parallel processing
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
A real-time Java chip-multiprocessor
ACM Transactions on Embedded Computing Systems (TECS)
Tale in the multi-core era: is java still competitive to host SIP applications?
ICC'09 Proceedings of the 2009 IEEE international conference on Communications
Proceedings of the 47th Design Automation Conference
Integrated execution: a programming model for accelerators
IBM Journal of Research and Development
MapReduce for the cell broadband engine architecture
IBM Journal of Research and Development
A parallel computing approach for tracking of neuronal fibers
IBM Journal of Research and Development
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Task superscalar: using processors as functional units
HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Hera-JVM: a runtime system for heterogeneous multi-core architectures
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
FSE'10 Proceedings of the 17th international conference on Fast software encryption
Federation: Boosting per-thread performance of throughput-oriented manycore architectures
ACM Transactions on Architecture and Code Optimization (TACO)
Performance analysis of the SHA-3 candidates on exotic multi-core architectures
CHES'10 Proceedings of the 12th international conference on Cryptographic hardware and embedded systems
Montgomery multiplication on the cell
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Exploring a Novel Gathering Method for Finite Element Codes on the Cell/B.E. Architecture
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Extending the cell SPE with energy efficient branch prediction
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
High-performance modular multiplication on the cell processor
WAIFI'10 Proceedings of the Third international conference on Arithmetic of finite fields
Leakage-aware multiprocessor scheduling for low power
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Run-time reconfiguration of communication in SIMD architectures
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Memory Latency Reduction via Thread Throttling
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Landing stencil code on Godson-T
Journal of Computer Science and Technology
Orchestration by approximation: mapping stream programs onto multicore architectures
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Acceleration of acoustic emission signal processing algorithms using CUDA standard
Computer Standards & Interfaces
Region-based parallelization of irregular reductions on explicitly managed memory hierarchies
The Journal of Supercomputing
FPGA vs. multi-core CPUs vs. GPUs: hands-on experience with a sorting application
Facing the multicore-challenge
FPGA vs. multi-core CPUs vs. GPUs: hands-on experience with a sorting application
Facing the multicore-challenge
Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Resource-constrained multiprocessor synthesis for floating-point applications on FPGAs
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Programming heterogeneous multicore systems using threading building blocks
Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
Automatic analysis of DMA races using model checking and k-induction
Formal Methods in System Design
Branch penalty reduction on IBM cell SPUs via software branch hinting
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A high performance heterogeneous architecture and its optimization design
HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
A case for dual-mapping one-way caches
ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
Automatic data distribution for improving data locality on the cell BE architecture
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
International Journal of Applied Cryptography
AFRICACRYPT'10 Proceedings of the Third international conference on Cryptology in Africa
Tiled multi-core stream architecture
Transactions on High-Performance Embedded Architectures and Compilers IV
Buffer sizing for self-timed stream programs on heterogeneous distributed memory multiprocessors
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Offload – automating code migration to heterogeneous multicore systems
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Automatic analysis of scratch-pad memory code for heterogeneous multicore processors
TACAS'10 Proceedings of the 16th international conference on Tools and Algorithms for the Construction and Analysis of Systems
International Journal of Applied Cryptography
Improving coherence protocol reactiveness by trading bandwidth for latency
Proceedings of the 9th conference on Computing Frontiers
Profile-guided deployment of stream programs on multicores
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
StreamPI: a stream-parallel programming extension for object-oriented programming languages
The Journal of Supercomputing
Hardware acceleration in the IBM PowerEN processor: architecture and performance
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Algorithms and architectures for 2D discrete wavelet transform
The Journal of Supercomputing
Write activity reduction on non-volatile main memories for embedded chip multiprocessors
ACM Transactions on Embedded Computing Systems (TECS)
High performance and low power design techniques for ASIC and custom in nanometer technologies
Proceedings of the 2013 ACM international symposium on International symposium on physical design
Efficient Loop Scheduling for Chip Multiprocessors with Non-Volatile Main Memory
Journal of Signal Processing Systems
Automatic parallelization of canonical loops
Science of Computer Programming
Hi-index | 0.00 |
This paper provides a background and rationale for some of the architecture and design decisions in the Cell processor, a processor optimized for compute-intensive and broadband rich media applications, jointly developed by Sony Group, Toshiba, and IBM.