Custom Wide Counterflow Pipelines for High-Performance Embedded Applications

Authors:
Bruce R. Childers;Jack W. Davidson
Affiliations:
-;-
Venue:
IEEE Transactions on Computers
Year:
2004

Citing 42
Cited 0

A portable global optimizer and linker

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
An architecture framework for application-specific and scalable architectures

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Viewing instruction set design as an optimization problem

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Splash 2

SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
A high-performance microarchitecture with hardware-programmable functional units

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Characterizing the impact of predicated execution on branch prediction

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
PRISC: programmable reduced instruction set computers

PRISC: programmable reduced instruction set computers
A comparison of full and partial predicated execution support for ILP processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
An evaluation system for application specific architectures

MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Managing pipeline-reconfigurable FPGAs

FPGA '98 Proceedings of the 1998 ACM/SIGDA sixth international symposium on Field programmable gate arrays
Designing Control Logic for Counterflow Pipeline Processor Using Petri Nets

Formal Methods in System Design
Data-path synthesis of VLIW video signal processors

Proceedings of the 11th international symposium on System synthesis
Reuse methodology manual: for system-on-a-chip designs

Reuse methodology manual: for system-on-a-chip designs
A reconfigurable arithmetic array for multimedia applications

FPGA '99 Proceedings of the 1999 ACM/SIGDA seventh international symposium on Field programmable gate arrays
PipeRench: a co/processor for streaming multimedia acceleration

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Surviving the SOC revolution: a guide to platform-based design

Surviving the SOC revolution: a guide to platform-based design
ShiftQ: a bufferred interconnect for custom loop accelerators

CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Microprocessor Architectures: From VLIW to Tta

Microprocessor Architectures: From VLIW to Tta
Baring It All to Software: Raw Machines

Computer
Trends in Embedded-Microprocessor Design

Computer
PICO: Automatically Designing Custom Computers

Computer
The Counterflow Pipeline Processor Architecture

IEEE Design & Test
Deep-Submicron Microprocessor Design Issues

IEEE Micro
Formal Verification of Counterflow Pipeline Architecture

Proceedings of the 8th International Workshop on Higher Order Logic Theorem Proving and Its Applications
A Design Environment for Counterflow Pipeline Synthesis

LCTES '98 Proceedings of the ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems
RaPiD - Reconfigurable Pipelined Datapath

FPL '96 Proceedings of the 6th International Workshop on Field-Programmable Logic, Smart Applications, New Paradigms and Compilers
High-Level Synthesis of Nonprogrammable Hardware Accelerators

ASAP '00 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors
On the Correctness of the Sproull Counterflow Pipeline Processor

ASYNC '96 Proceedings of the 2nd International Symposium on Advanced Research in Asynchronous Circuits and Systems
A Counterflow Pipeline Experiment

ASYNC '99 Proceedings of the 5th International Symposium on Advanced Research in Asynchronous Circuits and Systems
Architectural Considerations for Application-Specific Counterflow Pipelines

ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
Garp: a MIPS processor with a reconfigurable coprocessor

FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Mapping applications to the RaPiD configurable architecture

FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Specifying and Compiling Applications for RaPiD

FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
An Infrastructure for Designing Custom Embedded Counter-Flow Pipelines

HICSS '00 Proceedings of the 33rd Hawaii International Conference on System Sciences-Volume 8 - Volume 8
Advances of the Counterflow Pipeline Microarchitecture

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Non-Stalling CounterFlow Architecture

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Automatic Architectural Synthesis of VLIW and EPIC Processors

Proceedings of the 12th international symposium on System synthesis
A dynamic instruction set computer

FCCM '95 Proceedings of the IEEE Symposium on FPGA's for Custom Computing Machines
Automatic exploration of VLIW processor architectures from a designer's experience based specification

CODES '94 Proceedings of the 3rd international workshop on Hardware/software co-design
Synthesis of application specific instruction sets

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Quantified Score

Hi-index	14.98

Visualization

Abstract

Abstract--Application-specific instruction set processor (ASIP) design is a promising technique to meet the performance and cost goals of high-performance systems. ASIPs are especially valuable for embedded computing applications (e.g., digital cameras, color printers, cellular phones, etc.) where a small increase in performance and decrease in cost can have a large impact on a product's viability. Sutherland, Sproull, and Molnar originally proposed a processor organization called the counterflow pipeline (CFP) as a general-purpose architecture. We observed that the CFP is appropriate for ASIP design due to its simple and regular structure, local control and communication, and high degree of modularity. This paper describes a new CFP architecture, called the wide counterflow pipeline (WCFP), that extends the original proposal to be better suited for custom embedded instruction-level parallel processors. This work presents a novel and practical application of the CFP to automatic and quick turnaround design of ASIPs. The paper introduces the WCFP architecture and describes several microarchitecture capabilities needed to get good performance from custom WCFPs. We demonstrate that custom WCFPs have performance that is up to four times better than that of ASIPs based on the CFP. Using an analytic cost model, we show that custom WCFPs do not unduly increase the cost of the original counterflow pipeline architecture, yet they retain the simplicity of the CFP. We also compare custom WCFPs to custom VLIW architectures and demonstrate that the WCFP is performance competitive with traditional VLIWs without requiring complicated global interconnection of functional devices.