The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Route packets, not wires: on-chip inteconnection networks
Proceedings of the 38th annual Design Automation Conference
PowerHerd: dynamic satisfaction of peak power constraints in interconnection networks
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
A Delay Model and Speculative Architecture for Pipelined Routers
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Xen and the art of virtualization
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Capriccio: scalable threads for internet services
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Power-driven Design of Router Microarchitectures in On-chip Networks
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Principles and Practices of Interconnection Networks
Principles and Practices of Interconnection Networks
Low-Latency Virtual-Channel Routers for On-Chip Networks
Proceedings of the 31st annual international symposium on Computer architecture
Thermal-Aware IP Virtualization and Placement for Networks-on-Chip Architecture
ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Supporting task migration in multi-processor systems-on-chip: a feasibility study
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Proceedings of the 43rd annual Design Automation Conference
Virtual hierarchies to support server consolidation
Proceedings of the 34th annual international symposium on Computer architecture
Express virtual channels: towards the ideal interconnection fabric
Proceedings of the 34th annual international symposium on Computer architecture
Heuristics for Dynamic Task Mapping in NoC-based Heterogeneous MPSoCs
RSP '07 Proceedings of the 18th IEEE/IFIP International Workshop on Rapid System Prototyping
lmbench: portable tools for performance analysis
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Context switch overheads for Linux on ARM platforms
Proceedings of the 2007 workshop on Experimental computer science
Thread scheduling for multi-core platforms
HOTOS'07 Proceedings of the 11th USENIX workshop on Hot topics in operating systems
An Evaluation of Server Consolidation Workloads for Multi-Core Designs
IISWC '07 Proceedings of the 2007 IEEE 10th International Symposium on Workload Characterization
Application-aware deadlock-free oblivious routing
Proceedings of the 36th annual international symposium on Computer architecture
Thread motion: fine-grained power management for multi-core systems
Proceedings of the 36th annual international symposium on Computer architecture
Preemptive virtual clock: a flexible, efficient, and cost-effective QOS scheme for networks-on-chip
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Application-aware prioritization mechanisms for on-chip networks
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Addressing shared resource contention in multicore processors via scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
An operating system for multicore and clouds: mechanisms and implementation
Proceedings of the 1st ACM symposium on Cloud computing
Virtualizing network-on-chip resources in chip-multiprocessors
Microprocessors & Microsystems
Fast thread migration via cache working set prediction
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Performance and power aware CMP thread allocation modeling
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Hi-index | 0.00 |
As the number of cores integrated on a single chip continues to increase, communication has the potential to become a severe bottleneck to overall system performance. The presence of thread sharing and the distribution of data across cache banks on the chip can result in longdistance communication. Long-distance communication incurs substantial latency that impacts performance; furthermore, this communication consumes significant dynamic power when packets are switched over many Network-on-Chip (NoC) links and routers. Thread migration can mitigate problems created by long distance communication. This article presents Moths, an efficient runtime algorithm that responds automatically to dynamic NoC traffic patterns, providing beneficial thread migration to decrease overall traffic volume and average packet latency. Moths reduces on-chip network latency by up to 28.4% (18.0% on average) and traffic volume by up to 24.9% (20.6% on average) across a variety of commercial and scientific benchmarks.