Performance evaluation for a compressed-VLIW processor

Authors:
Sunghyun Jee;Kannappan Palaniappan
Affiliations:
Chonan College in Foreign Studies, 393 Anseo Dong, Cheonan, Chungcheong Namdo, South Korea, 330-705;University of Missouri, Columbia, MO
Venue:
Proceedings of the 2002 ACM symposium on Applied computing
Year:
2002

Citing 5
Cited 2

Dynamic rescheduling: a technique for object code compatibility in VLIW architectures

Proceedings of the 28th annual international symposium on Microarchitecture
A unified framework for instruction scheduling and mapping for function units with structural hazards

Journal of Parallel and Distributed Computing
21st-Century Microprocessors

IEEE Micro
The Importance of Prepass Code Scheduling for Superscalar and Superpipelined Processors

IEEE Transactions on Computers
Pipelining and Bypassing in a VLIW Processor

IEEE Transactions on Parallel and Distributed Systems

Reducing code size in VLIW instruction scheduling

Journal of Embedded Computing - Low-power Embedded Systems
Reducing instruction bit-width for low-power VLIW architectures

ACM Transactions on Design Automation of Electronic Systems (TODAES)

Quantified Score

Hi-index	0.05

Visualization

Abstract

This paper presents a new ILP processor architecture called Compressed VLIW (CVLIW). The CVLIW processor constructs a sequence of long instructions by removing nearly all NOPs (No OPerations) and LNOPs (Long NOPs) from VLIW code. The CVLIW processor individually schedules each instruction within long instructions using functional unit and dynamic scheduler pairs. Every dynamic scheduler in the CVLIW processor individually checks for data dependencies and resource collisions while scheduling each instruction. In this paper, we simulate the architecture and show that the CVLIW processor performs better than the VLIW processor for a wide range of cache sizes and across various numerical benchmark applications. These performance gains of the CVLIW processor result from individual instruction scheduling and size reduction of object code. Even though we assume a cache with a zero miss rate, the CVLIW's performance is still 9%~15% higher than that of the VLIW processor regardless of benchmark applications.