A portable global optimizer and linker
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Instruction-level parallel processing: history, overview, and perspective
The Journal of Supercomputing - Special issue on instruction-level parallelism
A 100-MIPS GaAs Asynchronous Microprocessor
IEEE Design & Test
A high-performance microarchitecture with hardware-programmable functional units
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Internal organization of the Alpha 21164, a 300-MHz 64-bit quad-issue CMOS RISC microprocessor
Digital Technical Journal - Special 10th anniversary issue
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
Hardware-Software Cosynthesis for Digital Systems
IEEE Design & Test
The Counterflow Pipeline Processor Architecture
IEEE Design & Test
A Design Environment for Counterflow Pipeline Synthesis
LCTES '98 Proceedings of the ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems
Fred: An Architecture for a Self-Timed Decoupled Computer
ASYNC '96 Proceedings of the 2nd International Symposium on Advanced Research in Asynchronous Circuits and Systems
AMULET2e: An Asynchronous Embedded Controller
ASYNC '97 Proceedings of the 3rd International Symposium on Advanced Research in Asynchronous Circuits and Systems
Advanced performance features of the 64-bit PA-8000
COMPCON '95 Proceedings of the 40th IEEE Computer Society International Conference
The Alpha 21264: A 500 MHz Out-of-Order Execution Microprocessor
COMPCON '97 Proceedings of the 42nd IEEE International Computer Conference
Garp: a MIPS processor with a reconfigurable coprocessor
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Mapping applications to the RaPiD configurable architecture
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Advances of the Counterflow Pipeline Microarchitecture
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Non-Stalling CounterFlow Architecture
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Automatic design of computer instruction sets
Automatic design of computer instruction sets
Synthesis of application specific instruction sets
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Custom Wide Counterflow Pipelines for High-Performance Embedded Applications
IEEE Transactions on Computers
Hi-index | 0.00 |
Application-specific processor design is a promising approach for meeting the performance and cost goals of a system. Application- specific processors are especially promising for embedded systems (e.g., digital cameras, cellular phones, etc.) where a small increase in performance and decrease in cost can have a large impact on a product's viability. Sutherland, Sproull, and Molnar have proposed a new pipeline organization called the Counterflow Pipeline (CFP). This paper evaluates CFP design alternatives and shows that the CFP is an ideal architecture for fast, low-cost design of high-performance processors customized for computation- intensive embedded applications. First, we describe why CFP's are particularly well-suited to realizing application-specific processors. Second, we describe how a CFP tailored to an application can be constructed automatically. Third, we present measurements that evaluate CFP design trade-offs and show that CFP's provide speculative and out-of-order execution, and register renaming that is matched to an application. Fourth, we show that asynchronous counterflow pipelines achieve high-performance by reducing the average execution latency of instructions over synchronous implementations. Finally, we demonstrate that custom CFP's achieve cycles per instruction measurements that are competitive with 4-way superscalar out-of-order processors at a potentially low design complexity.