Energy-efficient and high-performance instruction fetch using a block-aware ISA

Authors:
Ahmad Zmily;Christos Kozyrakis
Affiliations:
Stanford University;Stanford University
Venue:
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Year:
2005

Citing 19
Cited 4

A comprehensive instruction fetch mechanism for a processor supporting speculative execution

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Fast and accurate instruction fetch and branch prediction

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
A performance study of software and hardware data prefetching schemes

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Enhancing instruction scheduling with a block-structured ISA

International Journal of Parallel Programming
Alternative fetch and issue policies for the trace cache fetch mechanism

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Reducing the performance impact of instruction cache misses by writing instructions into the reservation stations out-of-order

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Improving trace cache effectiveness with branch promotion and trace packing

Proceedings of the 25th annual international symposium on Computer architecture
Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation

ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Fetch directed instruction prefetching

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Selective cache ways: on-demand cache resource allocation

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Optimizations Enabled by a Decoupled Front-End Architecture

IEEE Transactions on Computers
Reducing set-associative cache energy via way-prediction and selective direct-mapping

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
High Performance and Energy Efficient Serial Prefetch Architecture

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
The reduction of branch instruction execution overhead using structured control flow

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Power-Aware Control Speculation through Selective Throttling

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Branch Predictor Prediction: A Power-Aware Branch Predictor for High-Performance Processors

ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
Power Issues Related to Branch Prediction

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
On the performance and use of dense servers

IBM Journal of Research and Development

Simultaneously improving code size, performance, and energy in embedded processors

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Block-aware instruction set architecture

ACM Transactions on Architecture and Code Optimization (TACO)
A low power front-end for embedded processors using a block-aware instruction set

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Online strategies for high-performance power-aware thread execution on emerging multiprocessors

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The front-end in superscalar processors must deliver high application performance in an energy-effective manner. Impediments such as multi-cycle instruction accesses, instruction-cache misses, and mispredictions reduce performance by 48% and increase energy consumption by 21%. This paper presents a block-aware instruction set architecture (BLISS) that defines basic block descriptors in addition to the actual instructions in a program. BLISS allows for a decoupled front-end that reduces the time and energy spent on misspeculated instructions. It also allows for accurate instruction prefetching and energy efficient instruction access. A BLISS-based front-end leads to 14% IPC, 16% total energy, and 83% energy-delay-squared product improvements for wide-issue processors