An EPIC Processor with Pending Functional Units

Authors:
Lori Carter;Weihaw Chuang;Brad Calder
Affiliations:
-;-;-
Venue:
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Year:
2002

Citing 6
Cited 1

Dynamically scheduled VLIW processors

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Itanium Processor Microarchitecture

IEEE Micro
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Register Renaming and Scheduling for Dynamic Execution of Predicated Code

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Design of a Computer—The Control Data 6600

Design of a Computer—The Control Data 6600
An efficient algorithm for exploiting multiple arithmetic units

IBM Journal of Research and Development

A general framework to build new CPUs by mapping abstract machine code to instruction level parallel execution hardware

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Itanium processor, an implementation of an Explicitly Parallel Instruction Computing (EPIC) architecture, is an in-order processor that fetches, executes, and forwards results to functional units in-order. The architecture relies heavily on the compiler to expose Instruction Level Parallelism (ILP) to avoid stalls created by in-order processing.The goal of this paper is to examine, in small steps, changing the in-order Itanium processor model to allow execution to be performed out-of-order. The purpose is to overcome memory and functional unit latencies. To accomplish this, we consider an architecture with Pending Functional Units (PFU). The PFU architecture assigns/schedules instructions to functional units in-order. Instructions sit at the pending functional units until their operands become ready and then execute out-of-order. While an instruction is pending at a functional unit, no other instruction can be scheduled to that functional unit. We examine several PFU architecture designs. The minimal design does not perform renaming, and only supports bypassing of non-speculative result values. We then examine making PFU more aggressive by supporting speculative register state, and then finally by adding in register renaming. We show that the minimal PFU architecture provides on average an 18% speedup over an in-order EPIC processor and produces up to half of the speedup that would be gained using a full out-of-order architecture.