Block-aware instruction set architecture

Authors:
Ahmad Zmily;Christos Kozyrakis
Affiliations:
Stanford University, Palo Alto, CA;Stanford University, Palo Alto, CA
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2006

Citing 42
Cited 2

A comprehensive instruction fetch mechanism for a processor supporting speculative execution

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
A comparison of dynamic branch predictors that use two levels of branch history

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Fast and accurate instruction fetch and branch prediction

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
A performance study of software and hardware data prefetching schemes

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Enhancing instruction scheduling with a block-structured ISA

International Journal of Parallel Programming
Control flow prediction with tree-like subgraphs for superscalar processors

Proceedings of the 28th annual international symposium on Microarchitecture
Multiple-block ahead branch predictors

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Trace cache: a low latency approach to high bandwidth instruction fetching

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Increasing the instruction fetch rate via block-structured instruction set architectures

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Control flow prediction for dynamic ILP processors

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Path-based next trace prediction

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Alternative fetch and issue policies for the trace cache fetch mechanism

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Reducing the performance impact of instruction cache misses by writing instructions into the reservation stations out-of-order

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Memory system characterization of commercial workloads

Proceedings of the 25th annual international symposium on Computer architecture
Improving trace cache effectiveness with branch promotion and trace packing

Proceedings of the 25th annual international symposium on Computer architecture
The block-based trace cache

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
A scalable front-end architecture for fast instruction delivery

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Enhanced code compression for embedded RISC processors

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation

ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Fetch directed instruction prefetching

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Selective cache ways: on-demand cache resource allocation

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Dynamo: a transparent dynamic optimization system

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Optimizations Enabled by a Decoupled Front-End Architecture

IEEE Transactions on Computers
Slice-processors: an implementation of operation-based prediction

ICS '01 Proceedings of the 15th international conference on Supercomputing
Reducing set-associative cache energy via way-prediction and selective direct-mapping

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
SPEC CPU2000: Measuring CPU Performance in the New Millennium

Computer
Performance of the decoupled ACRI-1 architecture: the perfect club

HPCN Europe '95 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
High Performance and Energy Efficient Serial Prefetch Architecture

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Fetching instruction streams

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
The reduction of branch instruction execution overhead using structured control flow

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Power-Aware Control Speculation through Selective Throttling

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Branch Predictor Prediction: A Power-Aware Branch Predictor for High-Performance Processors

ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
Parallelism in the front-end

Proceedings of the 30th annual international symposium on Computer architecture
Effective ahead pipelining of instruction block address generation

Proceedings of the 30th annual international symposium on Computer architecture
An Efficient Compiler Technique for Code Size Reduction Using Reduced Bit-Width ISAs

Proceedings of the conference on Design, automation and test in Europe
Power Issues Related to Branch Prediction

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Low-power Branch Target Buffer for Application-Specific Embedded Processors

DSD '03 Proceedings of the Euromicro Symposium on Digital Systems Design
A low-complexity fetch architecture for high-performance superscalar processors

ACM Transactions on Architecture and Code Optimization (TACO)
Energy-efficient and high-performance instruction fetch using a block-aware ISA

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Improving instruction delivery with a block-aware ISA

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Power-aware branch logic: a hardware based technique for filtering access to branch logic

SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation

A low power front-end for embedded processors using a block-aware instruction set

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Implementing dynamic implied addressing mode for multi-output instructions

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Instruction delivery is a critical component for wide-issue, high-frequency processors since its bandwidth and accuracy place an upper limit on performance. The processor front-end accuracy and bandwidth are limited by instruction-cache misses, multicycle instruction-cache accesses, and target or direction mispredictions for control-flow operations. This paper presents a block-aware instruction set (BLISS) that allows software to assist with front-end challenges. BLISS defines basic block descriptors that are stored separately from the actual instructions in a program. We show that BLISS allows for a decoupled front-end that tolerates instruction-cache latency, facilitates instruction prefetching, and leads to higher prediction accuracy.