Do trace cache, value prediction and prefetching improve SMT throughput?

Authors:
Chen-Yong Cher;Il Park;T. N. VijayKumar
Affiliations:
ECE, Purdue University, IN;ECE, Purdue University, IN;ECE, Purdue University, IN
Venue:
ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
Year:
2006

Citing 27
Cited 0

Reducing memory latency via non-blocking and prefetching caches

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Increasing the instruction fetch rate via multiple branch prediction and a branch address cache

ICS '93 Proceedings of the 7th international conference on Supercomputing
Complexity/performance tradeoffs with non-blocking loads

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
SPAID: software prefetching in pointer- and call-intensive environments

Proceedings of the 28th annual international symposium on Microarchitecture
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Trace cache: a low latency approach to high bandwidth instruction fetching

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Prefetching using Markov predictors

Proceedings of the 24th annual international symposium on Computer architecture
Alternative fetch and issue policies for the trace cache fetch mechanism

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Improving the accuracy and performance of memory communication through renaming

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Streamlining inter-operation memory communication via data dependence prediction

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
An analysis of database workload performance on simultaneous multithreaded processors

Proceedings of the 25th annual international symposium on Computer architecture
Improving trace cache effectiveness with branch promotion and trace packing

Proceedings of the 25th annual international symposium on Computer architecture
Evaluation of Design Options for the Trace Cache Fetch Mechanism

IEEE Transactions on Computers - Special issue on cache memory and related problems
Selective value prediction

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
The block-based trace cache

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Dead-block prediction & dead-block correlating prefetchers

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Cache decay: exploiting generational behavior to reduce cache leakage power

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Timekeeping in the memory system: predicting and optimizing memory behavior

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Reducing the complexity of the register file in dynamic superscalar processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Handling long-latency loads in a simultaneous multithreading processor

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Reducing register ports for higher speed and lower energy

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
TCP: Tag Correlating Prefetchers

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Loose Loops Sink Chips

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

While trace cache, value prediction, and prefetching have been shown to be effective in the single-threaded superscalar, there has been no analysis of these techniques in a Simultaneously Multi threaded (SMT) processor. SMT brings new factors both for and against these techniques, and it is not known how these techniques would fare in SMT. We evaluate these techniques in an SMT to pro vide recommendations for future SMT designs. Our key contribu tions are: (1) we identify a fundamental interaction between the techniques and SMT's sharing of resources among multiple threads, and (2) we quantify the impact of this interaction on SMT through put. SMT's sharing of the instruction storage (i.e., trace cache or i-cache), physical registers, and issue queue impacts the effectiveness of trace cache, value prediction, and prefetching, respectively.