Instruction fetch mechanisms for VLIW architectures with compressed encodings

Authors:
Thomas M. Conte;Sanjeev Banerjia;Sergei Y. Larin;Kishore N. Menezes;Sumedh W. Sathaye
Affiliations:
Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, North Carolina;Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, North Carolina;Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, North Carolina;Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, North Carolina;Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, North Carolina
Venue:
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Year:
1996

Citing 17
Cited 21

Branch folding in the CRISP microprocessor: reducing branch delay to zero

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions, and Trade-Offs

Computer
Hardware support for large atomic units in dynamically scheduled machines

MICRO 21 Proceedings of the 21st annual workshop on Microprogramming and microarchitecture
Super-scalar processor design

Super-scalar processor design
Executing compressed programs on an embedded RISC architecture

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The multiflow trace scheduling compiler

The Journal of Supercomputing - Special issue on instruction-level parallelism
The Cydra 5 minisupercomputer: architecture and implementation

The Journal of Supercomputing - Special issue on instruction-level parallelism
The superblock: an effective technique for VLIW and superscalar compilation

The Journal of Supercomputing - Special issue on instruction-level parallelism
Dynamic rescheduling: a technique for object code compatibility in VLIW architectures

Proceedings of the 28th annual international symposium on Microarchitecture
Improving CISC instruction decoding performance using a fill unit

Proceedings of the 28th annual international symposium on Microarchitecture
The difference-bit cache

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Cache Memories

ACM Computing Surveys (CSUR)
Architecture of the Pentium Microprocessor

IEEE Micro
Tuning the Pentium Pro Microarchitecture

IEEE Micro
Developing the AMD-K5 Architecture

IEEE Micro
An architecture for high instruction level parallelism

HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
Experimental evaluation of on-chip microprocessor cache memories

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture

Trace-driven studies of VLIW video signal processors

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
MPS: Miss-Path Scheduling for Multiple-Issue Processors

IEEE Transactions on Computers
Compiler-driven cached code compression schemes for embedded ILP processors

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Code size minimization and retargetable assembly for custom EPIC and VLIW instruction formats

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Application specific compiler/architecture codesign: a case study

Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Enhancing loop buffering of media and telecommunications applications using low-overhead predication

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Reducing code size with echo instructions

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Reducing code size for heterogeneous-connectivity-based VLIW DSPs through synthesis of instruction set extensions

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Clustered Loop Buffer Organization for Low Energy VLIW Embedded Processors

IEEE Transactions on Computers
Supporting Demanding Hard-Real-Time Systems with STI

IEEE Transactions on Computers
Code Size Reduction in Heterogeneous-Connectivity-Based DSPs Using Instruction Set Extensions

IEEE Transactions on Computers
A Distributed Control Path Architecture for VLIW Processors

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Distributed loop controller architecture for multi-threading in uni-threaded VLIW processors

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Modeling wire delay, area, power, and performance in a simulation infrastructure

IBM Journal of Research and Development
Harnessing horizontal parallelism and vertical instruction packing of programs to improve system overall efficiency

Proceedings of the conference on Design, automation and test in Europe
A Novel instruction stream buffer for VLIW architectures

Computers and Electrical Engineering
Tree traversal scheduling: a global instruction scheduling technique for VLIW/EPIC processors

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Compilation strategies for reducing code size on a VLIW processor with variable length instructions

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Code compression for embedded VLIW processors using variable-to-fixed coding

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Low power engineering

Embedded Systems Design
Reducing instruction bit-width for low-power VLIW architectures

ACM Transactions on Design Automation of Electronic Systems (TODAES)

Quantified Score

Hi-index	0.01

Visualization

Abstract

VLIW architectures use very wide instruction words in conjunction with high bandwidth to the instruction cache to achieve multiple instruction issue. This report uses the TINKER experimental testbed to examine instruction fetch and instruction cache mechanisms for VLIWs. A compressed instruction encoding for VLIWs is defined and a classification scheme for i-fetch hardware for such an encoding is introduced. Several interesting cache and i-fetch organizations are described and evaluated through trace-driven simulations. A new i-fetch mechanism using a silo cache is found to have the best performance.