Aggressive Dynamic Execution of Decoded Traces

Authors:
Benjamin Bishop;Thomas P. Kelliher;Robert M. Owens;Mary Jane Irwin
Affiliations:
Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802;Department of Mathematics and Computer Science, Goucher College, Baltimore, MD 21204;Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802;Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802
Venue:
Journal of VLSI Signal Processing Systems - Special issue on the 1997 IEEE workshop on signal processing systems (SiPS): design and implementation
Year:
1999

Citing 12
Cited 0

HPSm, a high performance restricted data flow architecture having minimal functionality

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
HPS, a new microarchitecture: rationale and introduction

MICRO 18 Proceedings of the 18th annual workshop on Microprogramming
Hardware support for large atomic units in dynamically scheduled machines

MICRO 21 Proceedings of the 21st annual workshop on Microprogramming and microarchitecture
Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
Improving CISC instruction decoding performance using a fill unit

Proceedings of the 28th annual international symposium on Microarchitecture
Intel MMX for multimedia PCs

Communications of the ACM
Trace cache: a low latency approach to high bandwidth instruction fetching

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Stage-skip pipeline: a low power processor architecture using a decoded instruction buffer

ISLPED '96 Proceedings of the 1996 international symposium on Low power electronics and design
Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences

Proceedings of the 24th annual international symposium on Computer architecture
Exploiting instruction level parallelism in processors by caching scheduled groups

Proceedings of the 24th annual international symposium on Computer architecture
Putting the fill unit to work: dynamic optimizations for trace cache microprocessors

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Algorithm 419: zeros of a complex polynomial [C2]

Communications of the ACM

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we consider the increased performance that canbe obtained by using, in concert, three previously proposedenhancements. These enhancements are aggressive dynamic (run time)instruction scheduling, the reuse of decoded instructions, and tracescheduling (both aggressive dynamic instruction scheduling anddecoded instruction reuse have been used in commercial systems). Weshow that these three enhancements complement and support oneanother. Hence, while each of these enhancements has been shown tohave merit in its own right, when used in concert, we claim theoverall advantage is greater than that obtained by using any onesingly. To support this claim, we present the results from runningbenchmarks representing several common multimedia kernels.Subsequent simulations show results of 7.3 instructions completed percycle for the best-performing benchmark for a reasonably aggressivemicroarchitecture that combines trace scheduling of decodedinstructions (i.e., decoded traces) with aggressive dynamicexecution.