A portable global optimizer and linker
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
An architecture framework for application-specific and scalable architectures
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Viewing instruction set design as an optimization problem
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Iterative modulo scheduling: an algorithm for software pipelining loops
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
A high-performance microarchitecture with hardware-programmable functional units
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Characterizing the impact of predicated execution on branch prediction
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
PRISC: programmable reduced instruction set computers
PRISC: programmable reduced instruction set computers
A comparison of full and partial predicated execution support for ILP processors
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
An evaluation system for application specific architectures
MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Managing pipeline-reconfigurable FPGAs
FPGA '98 Proceedings of the 1998 ACM/SIGDA sixth international symposium on Field programmable gate arrays
Designing Control Logic for Counterflow Pipeline Processor Using Petri Nets
Formal Methods in System Design
Data-path synthesis of VLIW video signal processors
Proceedings of the 11th international symposium on System synthesis
Reuse methodology manual: for system-on-a-chip designs
Reuse methodology manual: for system-on-a-chip designs
A reconfigurable arithmetic array for multimedia applications
FPGA '99 Proceedings of the 1999 ACM/SIGDA seventh international symposium on Field programmable gate arrays
PipeRench: a co/processor for streaming multimedia acceleration
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Surviving the SOC revolution: a guide to platform-based design
Surviving the SOC revolution: a guide to platform-based design
ShiftQ: a bufferred interconnect for custom loop accelerators
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Microprocessor Architectures: From VLIW to Tta
Microprocessor Architectures: From VLIW to Tta
The Counterflow Pipeline Processor Architecture
IEEE Design & Test
Deep-Submicron Microprocessor Design Issues
IEEE Micro
Formal Verification of Counterflow Pipeline Architecture
Proceedings of the 8th International Workshop on Higher Order Logic Theorem Proving and Its Applications
A Design Environment for Counterflow Pipeline Synthesis
LCTES '98 Proceedings of the ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems
RaPiD - Reconfigurable Pipelined Datapath
FPL '96 Proceedings of the 6th International Workshop on Field-Programmable Logic, Smart Applications, New Paradigms and Compilers
High-Level Synthesis of Nonprogrammable Hardware Accelerators
ASAP '00 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors
On the Correctness of the Sproull Counterflow Pipeline Processor
ASYNC '96 Proceedings of the 2nd International Symposium on Advanced Research in Asynchronous Circuits and Systems
A Counterflow Pipeline Experiment
ASYNC '99 Proceedings of the 5th International Symposium on Advanced Research in Asynchronous Circuits and Systems
Architectural Considerations for Application-Specific Counterflow Pipelines
ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
Garp: a MIPS processor with a reconfigurable coprocessor
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Mapping applications to the RaPiD configurable architecture
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Specifying and Compiling Applications for RaPiD
FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
An Infrastructure for Designing Custom Embedded Counter-Flow Pipelines
HICSS '00 Proceedings of the 33rd Hawaii International Conference on System Sciences-Volume 8 - Volume 8
Advances of the Counterflow Pipeline Microarchitecture
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Non-Stalling CounterFlow Architecture
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Automatic Architectural Synthesis of VLIW and EPIC Processors
Proceedings of the 12th international symposium on System synthesis
A dynamic instruction set computer
FCCM '95 Proceedings of the IEEE Symposium on FPGA's for Custom Computing Machines
CODES '94 Proceedings of the 3rd international workshop on Hardware/software co-design
Synthesis of application specific instruction sets
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Hi-index | 14.98 |
Abstract--Application-specific instruction set processor (ASIP) design is a promising technique to meet the performance and cost goals of high-performance systems. ASIPs are especially valuable for embedded computing applications (e.g., digital cameras, color printers, cellular phones, etc.) where a small increase in performance and decrease in cost can have a large impact on a product's viability. Sutherland, Sproull, and Molnar originally proposed a processor organization called the counterflow pipeline (CFP) as a general-purpose architecture. We observed that the CFP is appropriate for ASIP design due to its simple and regular structure, local control and communication, and high degree of modularity. This paper describes a new CFP architecture, called the wide counterflow pipeline (WCFP), that extends the original proposal to be better suited for custom embedded instruction-level parallel processors. This work presents a novel and practical application of the CFP to automatic and quick turnaround design of ASIPs. The paper introduces the WCFP architecture and describes several microarchitecture capabilities needed to get good performance from custom WCFPs. We demonstrate that custom WCFPs have performance that is up to four times better than that of ASIPs based on the CFP. Using an analytic cost model, we show that custom WCFPs do not unduly increase the cost of the original counterflow pipeline architecture, yet they retain the simplicity of the CFP. We also compare custom WCFPs to custom VLIW architectures and demonstrate that the WCFP is performance competitive with traditional VLIWs without requiring complicated global interconnection of functional devices.