A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Stride directed prefetching in scalar processors
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Evaluating stream buffers as a secondary cache replacement
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Compiler optimizations for improving data locality
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Speculative multithreaded processors
ICS '98 Proceedings of the 12th international conference on Supercomputing
Data speculation support for a chip multiprocessor
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
A Chip-Multiprocessor Architecture with Speculative Multithreading
IEEE Transactions on Computers
Selective, accurate, and timely self-invalidation using last-touch prediction
Proceedings of the 27th annual international symposium on Computer architecture
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Effective Hardware-Based Data Prefetching for High-Performance Processors
IEEE Transactions on Computers
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
The STAMPede approach to thread-level speculation
ACM Transactions on Computer Systems (TOCS)
Fixed and Adaptive Sequential Prefetching in Shared Memory Multiprocessors
ICPP '93 Proceedings of the 1993 International Conference on Parallel Processing - Volume 01
Tolerating Dependences Between Large Speculative Threads Via Sub-Threads
Proceedings of the 33rd annual international symposium on Computer Architecture
Hi-index | 0.00 |
With the advent of chip-multiprocessors (CMPs), Thread-Level Speculation (TLS) remains a promising technique for exploiting this highly multithreaded hardware to improve the performance of an individual program. However, with such speculatively-parallel execution the cache locality once enjoyed by the original uniprocessor execution is significantly disrupted: for TLS execution on a four-processor CMP, we find that the data-cache miss rates are nearly four-times those of the uniprocessor case, even though TLS execution utilizes four private data caches (i.e., four-fold greater cache capacity). We break down the TLS cache locality problem into instruction and data cache, execution stages, and parallel access patterns, and propose methods to improve cache locality in each of these areas. We find that for parallel regions across 13 SPECint applications our simple and low-cost techniques reduce data-cache misses by 38%, improve performance by 12.8%, and significantly improve scalability--further enhancing the feasibility of TLS as a way to capitalize on future CMPs.