Branch folding in the CRISP microprocessor: reducing branch delay to zero
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Hardware support for large atomic units in dynamically scheduled machines
MICRO 21 Proceedings of the 21st annual workshop on Microprogramming and microarchitecture
Super-scalar processor design
Executing compressed programs on an embedded RISC architecture
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The multiflow trace scheduling compiler
The Journal of Supercomputing - Special issue on instruction-level parallelism
The Cydra 5 minisupercomputer: architecture and implementation
The Journal of Supercomputing - Special issue on instruction-level parallelism
The superblock: an effective technique for VLIW and superscalar compilation
The Journal of Supercomputing - Special issue on instruction-level parallelism
Dynamic rescheduling: a technique for object code compatibility in VLIW architectures
Proceedings of the 28th annual international symposium on Microarchitecture
Improving CISC instruction decoding performance using a fill unit
Proceedings of the 28th annual international symposium on Microarchitecture
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
ACM Computing Surveys (CSUR)
Architecture of the Pentium Microprocessor
IEEE Micro
Tuning the Pentium Pro Microarchitecture
IEEE Micro
Developing the AMD-K5 Architecture
IEEE Micro
An architecture for high instruction level parallelism
HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
Experimental evaluation of on-chip microprocessor cache memories
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Trace-driven studies of VLIW video signal processors
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
MPS: Miss-Path Scheduling for Multiple-Issue Processors
IEEE Transactions on Computers
Compiler-driven cached code compression schemes for embedded ILP processors
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Code size minimization and retargetable assembly for custom EPIC and VLIW instruction formats
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Application specific compiler/architecture codesign: a case study
Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Enhancing loop buffering of media and telecommunications applications using low-overhead predication
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Reducing code size with echo instructions
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Clustered Loop Buffer Organization for Low Energy VLIW Embedded Processors
IEEE Transactions on Computers
Supporting Demanding Hard-Real-Time Systems with STI
IEEE Transactions on Computers
Code Size Reduction in Heterogeneous-Connectivity-Based DSPs Using Instruction Set Extensions
IEEE Transactions on Computers
A Distributed Control Path Architecture for VLIW Processors
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Distributed loop controller architecture for multi-threading in uni-threaded VLIW processors
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Modeling wire delay, area, power, and performance in a simulation infrastructure
IBM Journal of Research and Development
Proceedings of the conference on Design, automation and test in Europe
A Novel instruction stream buffer for VLIW architectures
Computers and Electrical Engineering
Tree traversal scheduling: a global instruction scheduling technique for VLIW/EPIC processors
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Compilation strategies for reducing code size on a VLIW processor with variable length instructions
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Code compression for embedded VLIW processors using variable-to-fixed coding
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Embedded Systems Design
Reducing instruction bit-width for low-power VLIW architectures
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Hi-index | 0.01 |
VLIW architectures use very wide instruction words in conjunction with high bandwidth to the instruction cache to achieve multiple instruction issue. This report uses the TINKER experimental testbed to examine instruction fetch and instruction cache mechanisms for VLIWs. A compressed instruction encoding for VLIWs is defined and a classification scheme for i-fetch hardware for such an encoding is introduced. Several interesting cache and i-fetch organizations are described and evaluated through trace-driven simulations. A new i-fetch mechanism using a silo cache is found to have the best performance.