Improving instruction delivery with a block-aware ISA

Authors:
Ahmad Zmily;Earl Killian;Christos Kozyrakis
Affiliations:
Electrical Engineering Department, Stanford University;Electrical Engineering Department, Stanford University;Electrical Engineering Department, Stanford University
Venue:
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Year:
2005

Citing 16
Cited 3

A comprehensive instruction fetch mechanism for a processor supporting speculative execution

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Fast and accurate instruction fetch and branch prediction

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
A performance study of software and hardware data prefetching schemes

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Enhancing instruction scheduling with a block-structured ISA

International Journal of Parallel Programming
Path-based next trace prediction

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Alternative fetch and issue policies for the trace cache fetch mechanism

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Reducing the performance impact of instruction cache misses by writing instructions into the reservation stations out-of-order

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Improving trace cache effectiveness with branch promotion and trace packing

Proceedings of the 25th annual international symposium on Computer architecture
Fetch directed instruction prefetching

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Dynamo: a transparent dynamic optimization system

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Optimizations Enabled by a Decoupled Front-End Architecture

IEEE Transactions on Computers
SPEC CPU2000: Measuring CPU Performance in the New Millennium

Computer
Performance of the decoupled ACRI-1 architecture: the perfect club

HPCN Europe '95 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
Branch Prediction Using Profile Data

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
The reduction of branch instruction execution overhead using structured control flow

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture

Simultaneously improving code size, performance, and energy in embedded processors

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Block-aware instruction set architecture

ACM Transactions on Architecture and Code Optimization (TACO)
A low power front-end for embedded processors using a block-aware instruction set

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Instruction delivery is a critical component for wide-issue processors since its bandwidth and accuracy place an upper limit on performance. The processor front-end accuracy and bandwidth are limited by instruction cache misses, multi-cycle instruction cache accesses, and target or direction mispredictions for control-flow operations. This paper introduces a block-aware ISA (BLISS) that helps accurate instruction delivery by defining basic block descriptors in addition to and separate from the actual instructions in a program. We show that BLISS allows for a decoupled front-end that tolerates cache latency and allows for higher speculation accuracy. This translates to a 20% IPC and 14% energy improvements over conventional front-ends. We also demonstrate that a BLISS-based front-end outperforms by 13% decoupled front-ends that detect fetched blocks dynamically in hardware, without any information from the ISA.