Boosting trace cache performance with nonhead miss speculation

Authors:
Stevan Vlaovic;Edward S. Davidson
Affiliations:
Sun Microsystems, Palo Alto, CA;University of Michigan, Ann Arbor, MI
Venue:
ICS '02 Proceedings of the 16th international conference on Supercomputing
Year:
2002

Citing 6
Cited 0

Hardware support for large atomic units in dynamically scheduled machines

MICRO 21 Proceedings of the 21st annual workshop on Microprogramming and microarchitecture
Increasing the instruction fetch rate via multiple branch prediction and a branch address cache

ICS '93 Proceedings of the 7th international conference on Supercomputing
Control flow prediction with tree-like subgraphs for superscalar processors

Proceedings of the 28th annual international symposium on Microarchitecture
Trace cache: a low latency approach to high bandwidth instruction fetching

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Alternative fetch and issue policies for the trace cache fetch mechanism

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Trace cache design for wide-issue superscalar processors

Trace cache design for wide-issue superscalar processors

Quantified Score

Hi-index	0.00

Visualization

Abstract

Trace caches are used to help dynamic branch prediction make multiple predic驴tions in a cycle by embedding some of the predictions in the trace. In this work, we evaluate a trace cache that is capable of delivering a trace consisting of a variable number of instructions via a linked list mechanism. We evaluate several schemes in the context of an x86 processor model that stores decoded instructions. By developing a new classification for trace cache accesses, we are able to target those misses that cause the largest performance loss. We have pro驴posed a hardware speculation technique, called NonHead Miss Speculation, which removes much of the penalty associated with nonhead misses in the eight applica驴tions we studied. Performance improvements ranged from 2% to 20%, with an average speedup of around 10% across our application suite.