Fast thread migration via cache working set prediction

Authors:
Jeffery A. Brown;Leo Porter;Dean M. Tullsen
Affiliations:
University of California, San Diego, La Jolla, CA 92093-0404;University of California, San Diego, La Jolla, CA 92093-0404;University of California, San Diego, La Jolla, CA 92093-0404
Venue:
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Year:
2011

Citing 0
Cited 7

CRQ-based fair scheduling on composable multicore architectures

Proceedings of the 26th ACM international conference on Supercomputing
Software data-triggered threads

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Moths: Mobile threads for on-chip networks

ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
Orchestrator: a low-cost solution to reduce voltage emergencies for multi-threaded applications

Proceedings of the Conference on Design, Automation and Test in Europe
Revisiting reorder buffer architecture for next generation high performance computing

The Journal of Supercomputing
PAIS: Parallelism-aware interconnect scheduling in multicores

ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
An efficient and comprehensive scheduler on Asymmetric Multicore Architecture systems

Journal of Systems Architecture: the EUROMICRO Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

The most significant source of lost performance when a thread migrates between cores is the loss of cache state. A significant boost in post-migration performance is possible if the cache working set can be moved, proactively, with the thread. This work accelerates thread startup performance after migration by predicting and prefetching the working set of the application into the new cache. It shows that simply moving cache state performs poorly, and that moving the instruction working set can be even more critical than data. This paper demonstrates a technique that captures the access behavior of a thread, summarizes that behavior into a compact form for transfer between cores, and then prefetches appropriate data into the new caches based on the summary. It presents a detailed study of single-thread migration effects, and then demonstrates its utility on a speculative multithreading architecture. Working set prediction as much as doubles the performance of short-lived threads, and in a full speculative multithreading implementation, the technique is also shown to nearly double the effectiveness of the spawned threads.