A bridging model for parallel computation
Communications of the ACM
Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Pentium Processor System Architecture
Pentium Processor System Architecture
An analysis of the impact of MPI overlap and independent progress
Proceedings of the 18th annual international conference on Supercomputing
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Linux Device Drivers, 3rd Edition
Linux Device Drivers, 3rd Edition
ALS '01 Proceedings of the 5th annual Linux Showcase & Conference - Volume 5
Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation)
Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation)
Patterns for parallel programming
Patterns for parallel programming
The 48-core SCC Processor: the Programmer's View
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
An efficient software shared virtual memory for the single-chip cloud computer
Proceedings of the Second Asia-Pacific Workshop on Systems
Invasive MPI on intel's single-chip cloud computer
ARCS'12 Proceedings of the 25th international conference on Architecture of Computing Systems
High-performance RMA-based broadcast on the intel SCC
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Concurrency and Computation: Practice & Experience
Proceedings of the tenth ACM international conference on Embedded software
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Sparse matrix-vector multiplication on the Single-Chip Cloud Computer many-core processor
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
Many-core chips are changing the way high-performance computing systems are built and programmed. As it is becoming increasingly difficult to maintain cache coherence across many cores, manufacturers are exploring designs that do not feature any cache coherence between cores. Communications on such chips are naturally implemented using message passing, which makes them resemble clusters, but with an important difference. Special hardware can be provided that supports very fast on-chip communications, reducing latency and increasing bandwidth. We present one such chip, the Single-Chip Cloud Computer (SCC). This is an experimental processor, created by Intel Labs. We describe two communication libraries available on SCC: RCCE and Rckmb. RCCE is a light-weight, minimal library for writing message passing parallel applications. Rckmb provides the data link layer for running network services such as TCP/IP. Both utilize SCC's non-cache-coherent shared memory for transferring data between cores without needing to go off-chip. In this paper we describe the design and implementation of RCCE and Rckmb. To compare their performance, we consider simple benchmarks run with RCCE, and MPI over TCP/IP.