Moths: Mobile threads for on-chip networks

Authors:
Matthew Misler;Natalie Enright Jerger
Affiliations:
University of Toronto;University of Toronto, Toronto, Canada
Venue:
ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
Year:
2013

Citing 32
Cited 0

The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Route packets, not wires: on-chip inteconnection networks

Proceedings of the 38th annual Design Automation Conference
PowerHerd: dynamic satisfaction of peak power constraints in interconnection networks

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
A Delay Model and Speculative Architecture for Pipelined Routers

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Xen and the art of virtualization

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Capriccio: scalable threads for internet services

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Power-driven Design of Router Microarchitectures in On-chip Networks

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Principles and Practices of Interconnection Networks

Principles and Practices of Interconnection Networks
Low-Latency Virtual-Channel Routers for On-Chip Networks

Proceedings of the 31st annual international symposium on Computer architecture
Thermal-Aware IP Virtualization and Placement for Networks-on-Chip Architecture

ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Centralized Run-Time Resource Management in a Network-on-Chip Containing Reconfigurable Hardware Tiles

Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Energy- and Performance-Driven NoC Communication Architecture Synthesis Using a Decomposition Approach

Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Supporting task migration in multi-processor systems-on-chip: a feasibility study

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Rapid and low-cost context-switch through embedded processor customization for real-time and control applications

Proceedings of the 43rd annual Design Automation Conference
Virtual hierarchies to support server consolidation

Proceedings of the 34th annual international symposium on Computer architecture
Express virtual channels: towards the ideal interconnection fabric

Proceedings of the 34th annual international symposium on Computer architecture
Heuristics for Dynamic Task Mapping in NoC-based Heterogeneous MPSoCs

RSP '07 Proceedings of the 18th IEEE/IFIP International Workshop on Rapid System Prototyping
lmbench: portable tools for performance analysis

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Context switch overheads for Linux on ARM platforms

Proceedings of the 2007 workshop on Experimental computer science
Thread scheduling for multi-core platforms

HOTOS'07 Proceedings of the 11th USENIX workshop on Hot topics in operating systems
An Evaluation of Server Consolidation Workloads for Multi-Core Designs

IISWC '07 Proceedings of the 2007 IEEE 10th International Symposium on Workload Characterization
Application-aware deadlock-free oblivious routing

Proceedings of the 36th annual international symposium on Computer architecture
Thread motion: fine-grained power management for multi-core systems

Proceedings of the 36th annual international symposium on Computer architecture
Preemptive virtual clock: a flexible, efficient, and cost-effective QOS scheme for networks-on-chip

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Application-aware prioritization mechanisms for on-chip networks

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Addressing shared resource contention in multicore processors via scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
An operating system for multicore and clouds: mechanisms and implementation

Proceedings of the 1st ACM symposium on Cloud computing
Virtualizing network-on-chip resources in chip-multiprocessors

Microprocessors & Microsystems
Fast thread migration via cache working set prediction

HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Performance and power aware CMP thread allocation modeling

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers

Quantified Score

Hi-index	0.00

Visualization

Abstract

As the number of cores integrated on a single chip continues to increase, communication has the potential to become a severe bottleneck to overall system performance. The presence of thread sharing and the distribution of data across cache banks on the chip can result in longdistance communication. Long-distance communication incurs substantial latency that impacts performance; furthermore, this communication consumes significant dynamic power when packets are switched over many Network-on-Chip (NoC) links and routers. Thread migration can mitigate problems created by long distance communication. This article presents Moths, an efficient runtime algorithm that responds automatically to dynamic NoC traffic patterns, providing beneficial thread migration to decrease overall traffic volume and average packet latency. Moths reduces on-chip network latency by up to 28.4% (18.0% on average) and traffic volume by up to 24.9% (20.6% on average) across a variety of commercial and scientific benchmarks.