Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
Optimizing pipelines for power and performance
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Power Efficient Processor Architecture and The Cell Processor
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Application of full-system simulation in exploratory system design and development
IBM Journal of Research and Development
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
IBM Journal of Research and Development
Dynamic multigrain parallelization on the cell broadband engine
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
On the Design of a Photonic Network-on-Chip
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
JTRES '07 Proceedings of the 5th international workshop on Java technologies for real-time and embedded systems
Characterizing the Cell EIB On-Chip Network
IEEE Micro
Executing stream joins on the cell processor
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
CellSort: high performance sorting on the cell processor
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Hardware-aware analysis and optimization of stable fluids
Proceedings of the 2008 symposium on Interactive 3D graphics and games
Prefetching irregular references for software cache on cell
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Orchestrating data transfer for the cell/B.E. processor
Proceedings of the 22nd annual international conference on Supercomputing
IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
Toward Human Arm Attention and Recognition
Neural Information Processing
A Constraint Programming Approach for Allocation and Scheduling on the CELL Broadband Engine
CP '08 Proceedings of the 14th international conference on Principles and Practice of Constraint Programming
A Novel Asynchronous Software Cache Implementation for the Cell-BE Processor
Languages and Compilers for Parallel Computing
Hybrid access-specific software cache techniques for the cell BE architecture
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
COMIC: a coherent shared memory interface for cell be
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Accelerating BLASTP on the Cell Broadband Engine
PRIB '08 Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics
International Journal of Parallel Programming
SPENK: adding another level of parallelism on the cell broadband engine
IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
An efficient in-place 3D transpose for multicore processors with software managed memory hierarchy
IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
Automatic Pre-Fetch and Modulo Scheduling Transformations for the Cell BE Architecture
Languages and Compilers for Parallel Computing
Scientific Programming - High Performance Computing with the Cell Broadband Engine
Scientific Programming - High Performance Computing with the Cell Broadband Engine
High performance protein sequence database scanning on the Cell Broadband Engine
Scientific Programming - High Performance Computing with the Cell Broadband Engine
Building high-resolution sky images using the Cell/B.E.
Scientific Programming - High Performance Computing with the Cell Broadband Engine
Computing discrete transforms on the Cell Broadband Engine
Parallel Computing
CellJoin: a parallel stream join operator for the cell processor
The VLDB Journal — The International Journal on Very Large Data Bases
Celling SHIM: compiling deterministic concurrency to a heterogeneous multicore
Proceedings of the 2009 ACM symposium on Applied Computing
Scheduling dynamic parallelism on accelerators
Proceedings of the 6th ACM conference on Computing frontiers
Evaluating multi-core platforms for HPC data-intensive kernels
Proceedings of the 6th ACM conference on Computing frontiers
High-performance regular expression scanning on the Cell/B.E. processor
Proceedings of the 23rd international conference on Supercomputing
Computer generation of fast fourier transforms for the cell broadband engine
Proceedings of the 23rd international conference on Supercomputing
DBDB: optimizing DMATransfer for the cell be architecture
Proceedings of the 23rd international conference on Supercomputing
Efficient high performance collective communication for the cell blade
Proceedings of the 23rd international conference on Supercomputing
Time-predictable computer architecture
EURASIP Journal on Embedded Systems - FPGA supercomputing platforms, architectures, and techniques for accelerating computationally complex algorithms
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Performance balancing: software-based on-chip memory management for effective CMP executions
Proceedings of the 10th workshop on MEmory performance: DEaling with Applications, systems and architecture
The multikernel: a new OS architecture for scalable multicore systems
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
No cache-coherence: a single-cycle ring interconnection for multi-core L1-NUCA sharing on 3D chips
Proceedings of the 46th Annual Design Automation Conference
Modeling advanced collective communication algorithms on cell-based systems
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Buffer sharing in CSP-like programs
MEMOCODE'09 Proceedings of the 7th IEEE/ACM international conference on Formal Methods and Models for Codesign
Prototype design of cluster-based homogeneous multiprocessor system-on-chip
ASID'09 Proceedings of the 3rd international conference on Anti-Counterfeiting, security, and identification in communication
Mesh-of-trees and alternative interconnection networks for single-chip parallelism
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
ACM Transactions on Architecture and Code Optimization (TACO)
The Journal of Supercomputing
Optimizing the use of static buffers for DMA on a CELL chip
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
FFTC: fastest Fourier transform for the IBM cell broadband engine
HiPC'07 Proceedings of the 14th international conference on High performance computing
CG-Cell: an NPB benchmark implementation on cell broadband engine
ICDCN'08 Proceedings of the 9th international conference on Distributed computing and networking
Multi-stage benders decomposition for optimizing multicore architectures
CPAIOR'08 Proceedings of the 5th international conference on Integration of AI and OR techniques in constraint programming for combinatorial optimization problems
A real-time Java chip-multiprocessor
ACM Transactions on Embedded Computing Systems (TECS)
Communication-aware heuristics for run-time task mapping on NoC-based MPSoC platforms
Journal of Systems Architecture: the EUROMICRO Journal
Back Suction: Service Guarantees for Latency-Sensitive On-chip Networks
NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
Network interface design based on mutual interface definition
International Journal of High Performance Systems Architecture
The reverse-acceleration model for programming petascale hybrid systems
IBM Journal of Research and Development
ATAC: a 1000-core cache-coherent processor with on-chip optical network
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Recursion-driven parallel code generation for multi-core platforms
Proceedings of the Conference on Design, Automation and Test in Europe
A link arbitration scheme for quality of service in a latency-optimized network-on-chip
Proceedings of the Conference on Design, Automation and Test in Europe
Buffer sharing in rendezvous programs
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems - Special section on the ACM IEEE international conference on formal methods and models for codesign (MEMOCODE) 2009
Weighted random oblivious routing on torus networks
Proceedings of the 5th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Exploring a Novel Gathering Method for Finite Element Codes on the Cell/B.E. Architecture
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Efficient throughput-guarantees for latency-sensitive networks-on-chip
Proceedings of the 2010 Asia and South Pacific Design Automation Conference
An analytical network performance model for SIMD processor CSX600 interconnects
Journal of Systems Architecture: the EUROMICRO Journal
Throughput-Effective On-Chip Networks for Manycore Accelerators
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
A portable, efficient inter-core communication scheme for embedded multicore platforms
Journal of Systems Architecture: the EUROMICRO Journal
Hierarchical circuit-switched NoC for multicore video processing
Microprocessors & Microsystems
Improved scalability by using hardware-aware thread affinities
Facing the multicore-challenge
Transactions on high-performance embedded architectures and compilers III
Improved scalability by using hardware-aware thread affinities
Facing the multicore-challenge
International Journal of Communication Networks and Distributed Systems
A 98 GMACs/W 32-core vector processor in 65nm CMOS
Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
Cat-tail dma: efficient image data transport for multicore embedded mobile systems
Journal of Mobile Multimedia
A hybrid strategy for mapping multiple throughput-constrained applications on MPSoCs
CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Optimizing explicit data transfers for data parallel applications on the cell architecture
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
TL-DAE: thread-level decoupled access/execution for OpenMP on the cyclops-64 many-core processor
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Exploration of 3D grid caching strategies for ray-shooting
Journal of Real-Time Image Processing
Remote store programming: a memory model for embedded multicore
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Performance impact of task mapping on the cell BE multicore processor
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Analysis of gravitational wave signals on heterogeneous architectures
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
A dynamically reconfigurable communication architecture for multicore embedded systems
Journal of Systems Architecture: the EUROMICRO Journal
Networks on chips: structure and design methodologies
Journal of Electrical and Computer Engineering - Special issue on Networks-on-Chip: Architectures, Design Methodologies, and Case Studies
A metric for layout-friendly microarchitecture optimization in high-level synthesis
Proceedings of the 49th Annual Design Automation Conference
Microwave tomography for breast cancer detection on Cell broadband engine processors
Journal of Parallel and Distributed Computing
MCEmu: A Framework for Software Development and Performance Analysis of Multicore Systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Increasing the efficiency of the DaCS programming model for heterogeneous systems
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
A transpose-free in-place SIMD optimized FFT
ACM Transactions on Architecture and Code Optimization (TACO)
Scalable communication architectures for massively parallel hardware multi-processors
Journal of Parallel and Distributed Computing
A real-time, energy-efficient system software suite for heterogeneous multicore platforms
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Hardware-software coherence protocol for the coexistence of caches and local memories
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Scheduling streaming applications on a complex multicore platform
Concurrency and Computation: Practice & Experience
Accelerating throughput-aware runtime mapping for heterogeneous MPSoCs
ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special section on adaptive power management for energy and temperature-aware computing systems
Parallelization strategies for the points of interests algorithm on the cell processor
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
SSDM: smart stack data management for software managed multicores (SMMs)
Proceedings of the 50th Annual Design Automation Conference
CADSE: communication aware design space exploration for efficient run-time MPSoC management
Frontiers of Computer Science: Selected Publications from Chinese Universities
Designing on-chip networks for throughput accelerators
ACM Transactions on Architecture and Code Optimization (TACO)
Scheduling of synchronous data flow models onto scratchpad memory-based embedded processors
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on ESTIMedia'10
Flexible filters in stream programs
ACM Transactions on Embedded Computing Systems (TECS)
Optimizing two-dimensional DMA transfers for scratchpad Based MPSoCs platforms
Microprocessors & Microsystems
Design of massively parallel hardware multi-processors for highly-demanding embedded applications
Microprocessors & Microsystems
Direct distributed memory access for CMPs
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
Multicore designs promise various power-performance and area-performance benefits. But inadequate design of the on-chip communication network can deprive applications of these benefits. To illuminate this important point in multicore processor design, the authors analyze the Cell processor's communication network, using a series of benchmarks involving DMA traffic patterns and synchronization protocols.