Improving cache locality for thread-level speculation

Authors:
Stanley L. C. Fung;J. Gregory Steffan
Affiliations:
Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, Canada;Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, Canada
Venue:
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Year:
2006

Citing 15
Cited 1

A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Stride directed prefetching in scalar processors

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Evaluating stream buffers as a secondary cache replacement

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Compiler optimizations for improving data locality

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Speculative multithreaded processors

ICS '98 Proceedings of the 12th international conference on Supercomputing
Data speculation support for a chip multiprocessor

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
A Chip-Multiprocessor Architecture with Speculative Multithreading

IEEE Transactions on Computers
Selective, accurate, and timely self-invalidation using last-touch prediction

Proceedings of the 27th annual international symposium on Computer architecture
Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Effective Hardware-Based Data Prefetching for High-Performance Processors

IEEE Transactions on Computers
Speculative Versioning Cache

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
The STAMPede approach to thread-level speculation

ACM Transactions on Computer Systems (TOCS)
Fixed and Adaptive Sequential Prefetching in Shared Memory Multiprocessors

ICPP '93 Proceedings of the 1993 International Conference on Parallel Processing - Volume 01

Tolerating Dependences Between Large Speculative Threads Via Sub-Threads

Proceedings of the 33rd annual international symposium on Computer Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the advent of chip-multiprocessors (CMPs), Thread-Level Speculation (TLS) remains a promising technique for exploiting this highly multithreaded hardware to improve the performance of an individual program. However, with such speculatively-parallel execution the cache locality once enjoyed by the original uniprocessor execution is significantly disrupted: for TLS execution on a four-processor CMP, we find that the data-cache miss rates are nearly four-times those of the uniprocessor case, even though TLS execution utilizes four private data caches (i.e., four-fold greater cache capacity). We break down the TLS cache locality problem into instruction and data cache, execution stages, and parallel access patterns, and propose methods to improve cache locality in each of these areas. We find that for parallel regions across 13 SPECint applications our simple and low-cost techniques reduce data-cache misses by 38%, improve performance by 12.8%, and significantly improve scalability--further enhancing the feasibility of TLS as a way to capitalize on future CMPs.