Optimizing Compiler for the CELL Processor
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
Cell/B.E. blades: building blocks for scalable, real-time, interactive, and digital media servers
IBM Journal of Research and Development
Dma-based prefetching for i/o-intensive workloads on the cell architecture
Proceedings of the 5th conference on Computing frontiers
Optimizing large scale chemical transport models for multicore platforms
Proceedings of the 2008 Spring simulation multiconference
Entering the petaflop era: the architecture and performance of Roadrunner
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Controlling chaos: on safe side-effects in data-parallel operations
Proceedings of the 4th workshop on Declarative aspects of multicore programming
A comparison of programming models for multiprocessors with explicitly managed memory hierarchies
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
The Rise of the Commodity Vectors
High Performance Computing for Computational Science - VECPAR 2008
QR factorization for the Cell Broadband Engine
Scientific Programming - High Performance Computing with the Cell Broadband Engine
Programming the Linpack benchmark for the IBM PowerXCell 8i processor
Scientific Programming - High Performance Computing with the Cell Broadband Engine
Vector stream processing for effective application of heterogeneous parallelism
Proceedings of the 2009 ACM symposium on Applied Computing
Supporting MapReduce on large-scale asymmetric multi-core clusters
ACM SIGOPS Operating Systems Review
Optimized Pipelined Parallel Merge Sort on the Cell BE
Euro-Par 2008 Workshops - Parallel Processing
Optimized on-chip pipelining of memory-intensive computations on the cell BE
ACM SIGARCH Computer Architecture News
Dynamic code footprint optimization for the IBM Cell Broadband Engine
IWMSE '09 Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering
Carbon nanotube coated high-throughput neurointerfaces in assistive environments
Proceedings of the 2nd International Conference on PErvasive Technologies Related to Assistive Environments
A Parallel Point Matching Algorithm for Landmark Based Image Registration Using Multicore Platform
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Performance balancing: software-based on-chip memory management for effective CMP executions
Proceedings of the 10th workshop on MEmory performance: DEaling with Applications, systems and architecture
A speculative and adaptive MPI rendezvous protocol over RDMA-enabled interconnects
International Journal of Parallel Programming
Proceedings of the Third International Workshop on High-Performance Reconfigurable Computing Technology and Applications
Bridging parallel and reconfigurable computing with multilevel PGAS and SHMEM+
Proceedings of the Third International Workshop on High-Performance Reconfigurable Computing Technology and Applications
Multi-core platforms for signal processing: source and channel coding
ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
IEEE Transactions on Circuits and Systems for Video Technology
Parallel catastrophe modelling on a cell processor
CASCON '09 Proceedings of the 2009 Conference of the Center for Advanced Studies on Collaborative Research
Performance evaluation of basic linear algebra subroutines on a matrix co-processor
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
State-of-the-art in heterogeneous computing
Scientific Programming
Evaluation of parallel H.264 decoding strategies for the Cell Broadband Engine
Proceedings of the 24th ACM International Conference on Supercomputing
Evaluation of streaming aggregation on parallel hardware architectures
Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems
Designing Accelerator-Based Distributed Systems for High Performance
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
A Capabilities-Aware Programming Model for Asymmetric High-End Systems
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
IBM BladeCenter QS22: design, performance, and utilization in hybrid computing systems
IBM Journal of Research and Development
Accelerating 3D nonrigid registration using the cell broadband engine processor
IBM Journal of Research and Development
Characterization of Fixed and Reconfigurable Multi-Core Devices for Application Acceleration
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Hera-JVM: a runtime system for heterogeneous multi-core architectures
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Performance analysis of the SHA-3 candidates on exotic multi-core architectures
CHES'10 Proceedings of the 12th international conference on Cryptographic hardware and embedded systems
Adaptation of double-precision matrix multiplication to the cell broadband engine architecture
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Optimized on-chip-pipelined mergesort on the cell/B.E.
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Trace-based performance analysis framework for heterogeneous multicore systems
Proceedings of the 2010 Asia and South Pacific Design Automation Conference
A capabilities-aware framework for using computational accelerators in data-intensive computing
Journal of Parallel and Distributed Computing
Xetal-II: A Low-Power Massively-Parallel Processor for Video Scene Analysis
Journal of Signal Processing Systems
A portable, efficient inter-core communication scheme for embedded multicore platforms
Journal of Systems Architecture: the EUROMICRO Journal
Journal of Signal Processing Systems
Reusable software components for accelerator-based clusters
Journal of Systems and Software
Considerations when evaluating microprocessor platforms
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
OpenMP extensions for heterogeneous architectures
IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
CudaDMA: optimizing GPU memory bandwidth via warp specialization
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
SIAM Journal on Scientific Computing
Seamlessly portable applications: Managing the diversity of modern heterogeneous systems
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
I/O-efficient data structures for colored range and prefix reporting
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
The Journal of Supercomputing
Performance impact of task mapping on the cell BE multicore processor
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Towards efficient execution of erasure codes on multicore architectures
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
Transparent Accelerator Migration in a Virtualized GPU Environment
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
NIX: A Case for a Manycore System for Cloud Computing
Bell Labs Technical Journal
Direct approaches to exploit many-core architecture in bioinformatics
Future Generation Computer Systems
A transactional runtime system for the Cell/BE architecture
Journal of Parallel and Distributed Computing
Efficient task assignment on heterogeneous multicore systems considering communication overhead
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Proceeding of the 44th ACM technical symposium on Computer science education
All-pairs computations on many-core graphics processors
Parallel Computing
Quipu: A Statistical Model for Predicting Hardware Resources
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Rethinking algorithm-based fault tolerance with a cooperative software-hardware approach
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
RSVM: a region-based software virtual memory for GPU
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Hi-index | 0.00 |
The Cell Broadband Engine™ (Cell/B.E.) processor is the first implementation of the Cell Broadband Engine Architecture (CBEA), developed jointly by Sony, Toshiba, and IBM. In addition to use of the Cell/B.E. processor in the Sony Computer Entertainment PLAYSTATION® 3 system, there is much interest in using it for workstations, media-rich electronics devices, and video and image processing systems. The Cell/B.E. processor includes one PowerPC® processor element (PPE) and eight synergistic processor elements (SPEs). The CBEA is designed to be well suited for a wide variety of programming models, and it allows for partitioning of work between the PPE and the eight SPEs. In this paper we show that the Cell/B.E. processor can outperform other modern processors by approximately an order of magnitude and by even more in some cases.