Supporting islands of coherency for highly-parallel embedded architectures using compile-time virtualisation

Authors:
Ian Gray;Neil C. Audsley
Affiliations:
University of York, York, U.K.;University of York, York, U.K.
Venue:
Proceedings of the 13th International Workshop on Software & Compilers for Embedded Systems
Year:
2010

Citing 20
Cited 1

Using MPI: portable parallel programming with the message-passing interface

Using MPI: portable parallel programming with the message-passing interface
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing

PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
The CORBA reference guide: understanding the Common Object Request Broker Architecture

The CORBA reference guide: understanding the Common Object Request Broker Architecture
Route packets, not wires: on-chip inteconnection networks

Proceedings of the 38th annual Design Automation Conference
Component-based design approach for multicore SoCs

Proceedings of the 39th annual Design Automation Conference
Data Signal Processing: DSP and Applications

Data Signal Processing: DSP and Applications
The Real-Time Specification for Java

The Real-Time Specification for Java
Ravenscar-Java: a high integrity profile for real-time Java

JGI '02 Proceedings of the 2002 joint ACM-ISCOPE conference on Java Grande
The Ravenscar Tasking Profile for High Integrity Real-Time Programs

Ada-Europe '98 Proceedings of the 1998 Ada-Europe International Conference on Reliable Software Technologies
Scratchpad memory: design alternative for cache on-chip memory in embedded systems

Proceedings of the tenth international symposium on Hardware/software codesign
SoCIN: A Parametric and Scalable Network-on-Chip

SBCCI '03 Proceedings of the 16th symposium on Integrated circuits and systems design
Industry Trends: Chip Makers Turn to Multicore Processors

Computer
Cache coherence tradeoffs in shared-memory MPSoCs

ACM Transactions on Embedded Computing Systems (TECS)
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Parallel Programmability and the Chapel Language

International Journal of High Performance Computing Applications
Virtual tree coherence: Leveraging regions and in-network multicast trees for scalable cache coherence

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
JSR-282 status report

Proceedings of the 7th International Workshop on Java Technologies for Real-Time and Embedded Systems
Exposing non-standard architectures to embedded software using compile-time virtualisation

CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
The multikernel: a new OS architecture for scalable multicore systems

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles

Targeting complex embedded architectures by combining the multicore communications API (mcapi) with compile-time virtualisation

Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

As their complexity grows, the architectures of embedded systems are becoming increasingly parallel. However, the frameworks used to assist development on highly-parallel general-purpose systems (such as CORBA or MPI) are too heavyweight for use on the non-standard architectures of embedded systems. They introduce significant overheads due to the lack of architectural and structural information contained within most programming languages. Specifically, thread migration across irregular architectures can lead to very poor memory access times, and unconstrained cache coherency cannot scale to cope with large systems. This paper introduces an approach to solving these problems in a scalable way with minimal run-time overhead by using the concept of 'Islands of Coherency'. Cooperating threads are grouped into clusters along with the data that they use. These clusters can then be efficiently mapped to the target architecture, utilising migration only in the areas where the programmer explicitly declares it. This is supported through the use of an existing technique called Compile-Time Virtualisation (CTV). CTV does not support run-time dynamism, so it is extended to allow the implementation of Islands of Coherency. The presented system is evaluated experimentally through implementation on an FPGA platform. Simulation-based results are also presented that show the potential that this approach has for increasing the performance of future embedded systems.