MASA: a multithreaded processor architecture for parallel symbolic computing
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Can dataflow subsume von Neumann computing?
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
A bridging model for parallel computation
Communications of the ACM
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Sharing and protection in a single-address-space operating system
ACM Transactions on Computer Systems (TOCS) - Special issue on computer architecture
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
Journal of the ACM (JACM)
Advances in Petri Nets 1989, covers the 9th European Workshop on Applications and Theory in Petri Nets-selected papers
The Design and Implementation of the FreeBSD Operating System
The Design and Implementation of the FreeBSD Operating System
Chip Multithreading: Opportunities and Challenges
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
NPCryptBench: a cryptographic benchmark suite for network processors
MEDEA '05 Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture
Supporting microthread scheduling and synchronisation in CMPs
International Journal of Parallel Programming
SAC: a functional array language for efficient multi-threaded execution
International Journal of Parallel Programming
The Verification of the On-Chip COMA Cache Coherence Protocol
AMAST 2008 Proceedings of the 12th international conference on Algebraic Methodology and Software Technology
Implementation and evaluation of a microthread architecture
Journal of Systems Architecture: the EUROMICRO Journal
The implementation of an SVP many-core processor and the evaluation of its memory architecture
ACM SIGARCH Computer Architecture News
The Cilk++ concurrency platform
Proceedings of the 46th Annual Design Automation Conference
Strategies for compiling µTC to novel chip Multiprocessors
SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
On-chip COMA cache-coherence protocol for microgrids of microthreaded cores
Euro-Par'07 Proceedings of the 2007 conference on Parallel processing
The 48-core SCC Processor: the Programmer's View
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Analysis of execution efficiency in the microthreaded processor UTLEON3
ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
µTC: an intermediate language for programming chip multiprocessors
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Clearing the clouds: a study of emerging scale-out workloads on modern hardware
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Heterogeneous integration to simplify many-core architecture simulations
Proceedings of the 2012 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools
DSD '12 Proceedings of the 2012 15th Euromicro Conference on Digital System Design
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Hi-index | 0.00 |
To harness the potential of CMPs for scalable, energy-efficient performance in general-purpose computers, the Apple-CORE project has co-designed a general machine model and concurrency control interface with dedicated hardware support for concurrency management across multiple cores. Its SVP interface combines dataflow synchronisation with imperative programming, towards the efficient use of parallelism in general-purpose workloads. Its implementation in hardware provides logic able to coordinate single-issue, in-order multi-threaded RISC cores into computation clusters on chip, called Microgrids. In contrast with the traditional ''accelerator'' approach, Microgrids are components in distributed systems on chip that consider both clusters of small cores and optional, larger sequential cores as system services shared between applications. The key aspects of the design are asynchrony, i.e. the ability to tolerate irregular long latencies on chip, a scale-invariant programming model, a distributed chip resource model, and the transparent performance scaling of a single program binary code across multiple cluster sizes. This article describes the execution model, the core micro-architecture, its realization in a many-core, general-purpose processor chip and its software environment. This article also presents cycle-accurate simulation results for various key algorithmic and cryptographic kernels. The results show good efficiency in terms of the utilisation of hardware despite the high-latency memory accesses and good scalability across relatively large clusters of cores.