ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Efficient hardware for multiway jumps and pre-fetches
MICRO 18 Proceedings of the 18th annual workshop on Microprogramming
Branch folding in the CRISP microprocessor: reducing branch delay to zero
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
A VLIW architecture for a trace Scheduling Compiler
IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
A Development Environment for Horizontal Microcode
IEEE Transactions on Software Engineering
Very Long Instruction Word architectures and the ELI-512
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
2n-way jump microinstruction hardware and an effective instruction binding method
MICRO 13 Proceedings of the 13th annual workshop on Microprogramming
Percolation Scheduling: A Parallel Compilation Technique
Percolation Scheduling: A Parallel Compilation Technique
Bulldog: a compiler for vliw architectures (parallel computing, reduced-instruction-set, trace scheduling, scientific)
An efficient resource-constrained global scheduling technique for superscalar and VLIW processors
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
A study on the number of memory ports in multiple instruction issue machines
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Generalized Multiway Branch Unit for VLIW Microprocessors
IEEE Transactions on Parallel and Distributed Systems
Hi-index | 0.00 |
A VLIW architecture capable of testing multiple conditions in one cycle must support effective multiway (conditional) jumps. In this paper, a hardware-implemented, fast, and space-efficient multi-way jump mechanism is developed that speeds up the execution of multiple conditional jumps and reduces wasted storage. A cluster of multiple conditional jumps packed in an instruction can form an arbitrary rooted DAG (Directed Acyclic Graph), where each node corresponds to a condition. Our scheme provides a hardware device called an M-unit, which can combinationally produce the next target address using an encoded description of the DAG and the actual test bits. A technique to reduce the number of different configurations is introduced, along with a memory packing scheme that minimizes wasted memory.