Actors: a unifying model for parallel and distributed computing
Journal of Systems Architecture: the EUROMICRO Journal - Special issue on new trends in programming and execution models for parallel architectures, heterogeneously distributed systems and mobile computing
Pentium Processor System Architecture
Pentium Processor System Architecture
MPI: The Complete Reference
GASNet Specification, v1.1
High Performance Remote Memory Access Communication: The Armci Approach
International Journal of High Performance Computing Applications
Programming the Intel 80-core network-on-a-chip terascale processor
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
The libflame Library for Dense Matrix Computations
IEEE Design & Test
Distributed runtime load-balancing for software routers on homogeneous many-core processors
Proceedings of the Workshop on Programmable Routers for Extensible Services of Tomorrow
Light-weight communications on Intel's single-chip cloud computer processor
ACM SIGOPS Operating Systems Review
Formal analysis of message passing
VMCAI'11 Proceedings of the 12th international conference on Verification, model checking, and abstract interpretation
Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Mapping of applications to MPSoCs
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
RCKMPI - lightweight MPI implementation for intel's single-chip cloud computer (SCC)
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Performance tuning of SCC-MPICH by means of the proposed MPI-3.0 tool interface
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Revisiting shared virtual memory systems for non-coherent memory-coupled cores
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
TM2C: a software transactional memory for many-cores
Proceedings of the 7th ACM european conference on Computer Systems
Lifting the barriers --- reducing latencies with transparent transactional memory
ICDCN'12 Proceedings of the 13th international conference on Distributed Computing and Networking
Invasive MPI on intel's single-chip cloud computer
ARCS'12 Proceedings of the 25th international conference on Architecture of Computing Systems
X10 on the single-chip cloud computer: porting and preliminary performance
Proceedings of the 2011 ACM SIGPLAN X10 Workshop
A Framework for exploration of parallel SystemC simulation on the single-chip cloud computer
Proceedings of the 5th International ICST Conference on Simulation Tools and Techniques
Exploring cross-layer power management for PGAS applications on the SCC platform
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Node-based memory management for scalable NUMA architectures
Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers
Concurrency and Computation: Practice & Experience
Critical path-based thread placement for NUMA systems
ACM SIGMETRICS Performance Evaluation Review
Wait-Free message passing protocol for non-coherent shared memory architectures
EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Producer-Consumer: the programming model for future many-core processors
ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
A dynamically reconfigurable operating system for manycore systems
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Addressing the challenges of future large-scale many-core architectures
Proceedings of the ACM International Conference on Computing Frontiers
Application-level voltage and frequency tuning of multi-phase program on the SCC
Proceedings of the 3rd International Workshop on Adaptive Self-Tuning Computing Systems
Proceedings of the First International Workshop on Many-core Embedded Systems
Rethinking algorithm-based fault tolerance with a cooperative software-hardware approach
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Adaptive communication mechanism for accelerating MPI functions in NoC-based multicore processors
ACM Transactions on Architecture and Code Optimization (TACO)
Empirical and theoretical lower bounds on energy consumption for networks on chip
Proceedings of the Sixth International Workshop on Network on Chip Architectures
K2: a mobile operating system for heterogeneous coherence domains
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Apple-CORE: Harnessing general-purpose many-cores with hardware concurrency management
Microprocessors & Microsystems
Hi-index | 0.00 |
The number of cores integrated onto a single die is expected to climb steadily in the foreseeable future. This move to many-core chips is driven by a need to optimize performance per watt. How best to connect these cores and how to program the resulting many-core processor, however, is an open research question. Designs vary from GPUs to cache-coherent shared memory multiprocessors to pure distributed memory chips. The 48-core SCC processor reported in this paper is an intermediate case, sharing traits of message passing and shared memory architectures. The hardware has been described elsewhere. In this paper, we describe the programmer's view of this chip. In particular we describe RCCE: the native message passing model created for the SCC processor.