Exploiting superword level parallelism with multimedia instruction sets
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Optimizing pipelines for power and performance
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Power Efficient Processor Architecture and The Cell Processor
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Power and performance optimization at the system level
Proceedings of the 2nd conference on Computing frontiers
Optimizing Compiler for the CELL Processor
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Chip multiprocessing and the cell broadband engine
Proceedings of the 3rd conference on Computing frontiers
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
Software-based instruction caching for embedded processors
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
A parallel dynamic programming algorithm on a multi-core architecture
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Proceedings of the 34th annual international symposium on Computer architecture
Multilevel parallelization on the cell/B.E. for a motion JPEG 2000 encoding server
Proceedings of the 15th international conference on Multimedia
Cell broadband engine architecture and its first implementation: a performance view
IBM Journal of Research and Development
The cell broadband engine: exploiting multiple levels of parallelism in a chip multiprocessor
International Journal of Parallel Programming
Cell GC: using the cell synergistic processor as a garbage collection coprocessor
Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Optimization of sparse matrix-vector multiplication on emerging multicore platforms
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Dma-based prefetching for i/o-intensive workloads on the cell architecture
Proceedings of the 5th conference on Computing frontiers
Versatility of extended subwords and the matrix register file
ACM Transactions on Architecture and Code Optimization (TACO)
CUBA: an architecture for efficient CPU/co-processor data communication
Proceedings of the 22nd annual international conference on Supercomputing
Compiler driven data layout optimization for regular/irregular array access patterns
Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Entering the petaflop era: the architecture and performance of Roadrunner
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Characterizing the Basic Synchronization and Communication Operations in Dual Cell-Based Blades
ICCS '08 Proceedings of the 8th international conference on Computational Science, Part I
DRAM is plenty fast for wirespeed statistics counting
ACM SIGMETRICS Performance Evaluation Review
COMIC: a coherent shared memory interface for cell be
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
A unified model for multicore architectures
IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
COTSon: infrastructure for full system simulation
ACM SIGOPS Operating Systems Review
Implementing a parallel matrix factorization library on the cell broadband engine
Scientific Programming - High Performance Computing with the Cell Broadband Engine
Scientific Programming - High Performance Computing with the Cell Broadband Engine
Tradeoffs in designing accelerator architectures for visual computing
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Parallel Simulation of Oil Reservoirs on a Multi-core Stream Computer
Transactions on Computational Science III
A multi-streaming SIMD architecture for multimedia applications
Proceedings of the 6th ACM conference on Computing frontiers
Data parallel acceleration of decision support queries using Cell/BE and GPUs
Proceedings of the 6th ACM conference on Computing frontiers
Computer Speech and Language
High-performance regular expression scanning on the Cell/B.E. processor
Proceedings of the 23rd international conference on Supercomputing
Using many-core hardware to correlate radio astronomy signals
Proceedings of the 23rd international conference on Supercomputing
Programming model for a heterogeneous x86 platform
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
Implementing a hierarchical Bayesian visual cortex model on multi-core processors
Proceedings of the 47th Annual Southeast Regional Conference
From Silicon to Science: The Long Road to Production Reconfigurable Supercomputing
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
ACM SIGARCH Computer Architecture News
On compile-time evaluation of process partitioning transformations for Kahn process networks
CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Massively parallel processing: it's déjà vu all over again
Proceedings of the 46th Annual Design Automation Conference
Evaluating multicore algorithms on the unified memory model
Scientific Programming - Software Development for Multi-core Computing Systems
A case study on dynamic kernel adaptation in a component-based infectious disease simulator
Proceedings of the 2009 Workshop on Component-Based High Performance Computing
Cell/B.E. based on-line multi-view broadcasting system
ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Parallelizing two classes of neuromorphic models on the cell multicore architecture
IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Brain derived vision algorithm on high performance architectures
International Journal of Parallel Programming
IEEE Transactions on Circuits and Systems for Video Technology
An efficient software cache for H.264 motion compensation
SOC'09 Proceedings of the 11th international conference on System-on-chip
High-performance cone beam reconstruction using CUDA compatible GPUs
Parallel Computing
Experiences with parallelizing a bio-informatics program on the cell BE
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Drug design issues on the cell BE
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Towards secure data management system for grid environment based on the cell broadband engine
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Towards efficient video compression using scalable vector graphics on the Cell/B.E.
Proceedings of the 3rd International Workshop on Multicore Software Engineering
Optimized SAD calculation algorithm for Cell® processor
Companion Proceedings of the XIV Brazilian Symposium on Multimedia and the Web
Modeling critical sections in Amdahl's law and its implications for multicore design
Proceedings of the 37th annual international symposium on Computer architecture
Integrated execution: a programming model for accelerators
IBM Journal of Research and Development
MapReduce for the cell broadband engine architecture
IBM Journal of Research and Development
A multi-streaming SIMD multimedia computing engine
Microprocessors & Microsystems
An OpenCL framework for heterogeneous multicores with local memory
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Hardware parallelism vs. software parallelism
HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
N-CET: network-centric exploitation and tracking
MILCOM'09 Proceedings of the 28th IEEE conference on Military communications
A balanced programming model for emerging heterogeneous multicore systems
HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Recursion-driven parallel code generation for multi-core platforms
Proceedings of the Conference on Design, Automation and Test in Europe
Parallelizing the H.264 decoder on the cell BE architecture
EMSOFT '10 Proceedings of the tenth ACM international conference on Embedded software
Exploiting compression opportunities to improve SpMxV performance on shared memory systems
ACM Transactions on Architecture and Code Optimization (TACO)
Design and performance analysis of a DRAM-based statistics counter array architecture
Proceedings of the 5th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Extending the cell SPE with energy efficient branch prediction
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Implementation and performance analysis of parallel conjugate gradient on the cell broadband engine
IBM Journal of Research and Development
An instruction to accelerate software caches
ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
Exploring the architecture of a stream register-based snoop filter
Transactions on high-performance embedded architectures and compilers III
Cost-effectively offering private buffers in SoCs and CMPs
Proceedings of the international conference on Supercomputing
Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators
Proceedings of the 38th annual international symposium on Computer architecture
CPU support for secure executables
TRUST'11 Proceedings of the 4th international conference on Trust and trustworthy computing
Buffer-integrated-Cache: a cost-effective SRAM architecture for handheld and embedded platforms
Proceedings of the 48th Design Automation Conference
EFFEX: an embedded processor for computer vision based feature extraction
Proceedings of the 48th Design Automation Conference
Branch penalty reduction on IBM cell SPUs via software branch hinting
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Computers and Electrical Engineering
Efficient SIMD code generation for irregular kernels
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
OPODIS'11 Proceedings of the 15th international conference on Principles of Distributed Systems
Analysis of gravitational wave signals on heterogeneous architectures
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
Concurrency and Computation: Practice & Experience
Amdahl's law for predicting the future of multicores considered harmful
ACM SIGARCH Computer Architecture News
A survey on hardware-aware and heterogeneous computing on multicore processors and accelerators
Concurrency and Computation: Practice & Experience
Configurable fine-grain protection for multicore processor virtualization
Proceedings of the 39th Annual International Symposium on Computer Architecture
Increasing the efficiency of the DaCS programming model for heterogeneous systems
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
DRAM-based statistics counter array architecture with performance guarantee
IEEE/ACM Transactions on Networking (TON)
A Multidimensional Software Cache for Scratchpad-Based Systems
International Journal of Embedded and Real-Time Communication Systems
Discerning the dominant out-of-order performance advantage: is it speculation or dynamism?
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Vector Extensions for Decision Support DBMS Acceleration
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Journal of Parallel and Distributed Computing
Exploring the Tradeoffs between Programmability and Efficiency in Data-Parallel Accelerators
ACM Transactions on Computer Systems (TOCS)
Location-aware cache management for many-core processors with deep cache hierarchy
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Scalability study of molecular dynamics simulation on Godson-T many-core architecture
Journal of Parallel and Distributed Computing
Dynamic Power and Thermal Management of NoC-Based Heterogeneous MPSoCs
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
A Speculative Parallel DFA Membership Test for Multicore, SIMD and Cloud Computing Environments
International Journal of Parallel Programming
Hi-index | 0.00 |
Eight synergistic processor units enable the Cell Broadband Engine's breakthrough performance. The SPU architecture implements a novel, pervasively data-parallel architecture combining scalar and SIMD processing on a wide data path. A large number of SPUs per chip provide high thread-level parallelism.