Instruction fetch mechanisms for VLIW architectures with compressed encodings
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Compiler-driven cached code compression schemes for embedded ILP processors
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
An operation rearrangement technique for power optimization in VLIM instruction fetch
Proceedings of the conference on Design, automation and test in Europe
An Instruction Buffer for a Low-Power DSP
ASYNC '00 Proceedings of the 6th International Symposium on Advanced Research in Asynchronous Circuits and Systems
Instruction Scheduling for Clustered VLIW DSPs
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Computer Architecture: A Quantitative Approach
Computer Architecture: A Quantitative Approach
An Efficient VLIW DSP Architecture for Baseband Processing
ICCD '03 Proceedings of the 21st International Conference on Computer Design
A unified processor architecture for RISC & VLIW DSP
GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Application-Specific Data Path for Highly Efficient Computation of Multistandard Video Codecs
IEEE Transactions on Circuits and Systems for Video Technology
High-speed low-power multiplexer-based selector for priority policy
Computers and Electrical Engineering
Hi-index | 0.00 |
The instruction compression mechanism used to solve the drawbacks of traditional very long instruction word (VLIW) architectures often leads to poor code density in the instruction cache, which causes the irregular lengths of long instructions to cross the different cache line. These split long instructions cannot be fetched simultaneously, which creates a bottleneck for VLIW architectures. This paper proposes a buffing mechanism which can slide the split long instruction as a continuous form to offer better efficiency in instruction fetching. This approach helps maintain the behaviors of the software pipeline technology, which schedules iterative instructions to enhance the performance of streaming processing for VLIW architectures. In the proposed mechanism, the instruction stream buffer stores the repeat block completely and suspends as far as possible the cache access to reduce access time. The advantages of repeatedly issuing instructions in the instruction buffer and preventing split long instructions, can substantially improve the performance in fetching instructions. Simulation results show that the mechanism is efficient at the instruction level for the basic DSP/IMG library by improving performance by 35% on average.