PipeRench: a co/processor for streaming multimedia acceleration

Authors:
Seth Copen Goldstein;Herman Schmit;Matthew Moe;Mihai Budiu;Srihari Cadambi;R. Reed Taylor;Ronald Laufer
Affiliations:
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA;Department of ECE, Carnegie Mellon University, Pittsburgh, PA;Department of ECE, Carnegie Mellon University, Pittsburgh, PA;School of Computer Science, Carnegie Mellon University, Pittsburgh, PA;Department of ECE, Carnegie Mellon University, Pittsburgh, PA;Department of ECE, Carnegie Mellon University, Pittsburgh, PA;Department of ECE, Carnegie Mellon University, Pittsburgh, PA
Venue:
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Year:
1999

Citing 19
Cited 113

A VLIW architecture for a trace scheduling compiler

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
A high-performance microarchitecture with hardware-programmable functional units

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Spert-II: A Vector Microprocessor System

Computer - Special issue: neural computing: companion issue to Spring 1996 IEEE Computational Science & Engineering
Intel MMX for multimedia PCs

Communications of the ACM
Managing pipeline-reconfigurable FPGAs

FPGA '98 Proceedings of the 1998 ACM/SIGDA sixth international symposium on Field programmable gate arrays
Fast compilation for pipelined reconfigurable fabrics

FPGA '99 Proceedings of the 1999 ACM/SIGDA seventh international symposium on Field programmable gate arrays
How Multimedia Workloads Will Change Processor Design

Computer
One Billion Transistors, One Uniprocessor, One Chip

Computer
Superspeculative Microarchitecture for Beyond AD 2000

Computer
Trace Processors: Moving to Fourth-Generation Microarchitectures

Computer
Scalable Processors in the Billion-Transistor Era: IRAM

Computer
Baring It All to Software: Raw Machines

Computer
The Chimaera reconfigurable functional unit

FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Garp: a MIPS processor with a reconfigurable coprocessor

FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Incremental reconfiguration for pipelined applications

FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Specifying and Compiling Applications for RaPiD

FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
The NAPA Adaptive Processing Architecture

FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
A dynamic instruction set computer

FCCM '95 Proceedings of the IEEE Symposium on FPGA's for Custom Computing Machines
Reconfigurable architectures for general-purpose computing

Reconfigurable architectures for general-purpose computing

Exploiting ILP in page-based intelligent memory

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
A C compiler for a processor with a reconfigurable functional unit

FPGA '00 Proceedings of the 2000 ACM/SIGDA eighth international symposium on Field programmable gate arrays
CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit

Proceedings of the 27th annual international symposium on Computer architecture
MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications

IEEE Transactions on Computers
PipeRench implementation of the instruction path coprocessor

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Attacking the semantic gap between application programming languages and configurable hardware

FPGA '01 Proceedings of the 2001 ACM/SIGDA ninth international symposium on Field programmable gate arrays
Matching and searching analysis for parallel hardware implementation on FPGAs

FPGA '01 Proceedings of the 2001 ACM/SIGDA ninth international symposium on Field programmable gate arrays
A decade of reconfigurable computing: a visionary retrospective

Proceedings of the conference on Design, automation and test in Europe
Coarse grain reconfigurable architecture (embedded tutorial)

Proceedings of the 2001 Asia and South Pacific Design Automation Conference
NanoFabrics: spatial computing using molecular electronics

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
C Compiler Design for an Industrial Network Processor

OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
Mapping a Single Assignment Programming Language to Reconfigurable Systems

The Journal of Supercomputing
A compiler approach to fast hardware design space exploration in FPGA-based systems

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Exploiting operation level parallelism through dynamically reconfigurable datapaths

Proceedings of the 39th annual Design Automation Conference
A fast, inexpensive and scalable hardware acceleration technique for functional simulation

Proceedings of the 39th annual Design Automation Conference
Improving embedded system design by means of HW-SW compilation on reconfigurable coprocessors

Proceedings of the 15th international symposium on System Synthesis
Reconfigurable Computing for Digital Signal Processing: A Survey

Journal of VLSI Signal Processing Systems
Configuration relocation and defragmentation for run-time reconfigurable computing

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Reconfigurable media processing

Parallel Computing - Parallel computing in image and video processing
PipeRench: A Reconfigurable Architecture and Compiler

Computer
Reconfigurable Instruction Set Processors from a Hardware/Software Perspective

IEEE Transactions on Software Engineering
Compilation Approach for Coarse-Grained Reconfigurable Architectures

IEEE Design & Test
Pattern Recognition Tool to Detect Reconfigurable Patterns in MPEG4 Video Processing

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
MorphoSys: A Coarse Grain Reconfigurable Architecture for Multimedia Applications (Research Note)

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
BitValue Inference: Detecting and Exploiting Narrow Bitwidth Computations

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
A Compiler Directed Approach to Hiding Configuration Latency in Chameleon Processors

FPL '00 Proceedings of the The Roadmap to Reconfigurable Computing, 10th International Workshop on Field-Programmable Logic and Applications
Stream Computations Organized for Reconfigurable Execution (SCORE)

FPL '00 Proceedings of the The Roadmap to Reconfigurable Computing, 10th International Workshop on Field-Programmable Logic and Applications
Generation of Design Suggestions for Coarse-Grain Reconfigurable Architectures

FPL '00 Proceedings of the The Roadmap to Reconfigurable Computing, 10th International Workshop on Field-Programmable Logic and Applications
Static Profile-Driven Compilation for FPGAs

FPL '01 Proceedings of the 11th International Conference on Field-Programmable Logic and Applications
The MOLEN rho-mu-Coded Processor

FPL '01 Proceedings of the 11th International Conference on Field-Programmable Logic and Applications
Efficient Mapping of Pre-synthesized IP-Cores onto Dynamically Reconfigurable Array Architectures

FPL '01 Proceedings of the 11th International Conference on Field-Programmable Logic and Applications
iPACE-V1: A Portable Adaptive Computing Engine for Real Time Applications

FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
Design-Space Exploration of Low Power Coarse Grained Reconfigurable Datapath Array Architectures

PATMOS '00 Proceedings of the 10th International Workshop on Integrated Circuit Design, Power and Timing Modeling, Optimization and Simulation
Compiler-generated communication for pipelined FPGA applications

Proceedings of the 40th annual Design Automation Conference
Embedded intelligent SRAM

Proceedings of the 40th annual Design Automation Conference
CPR: A Configuration Profiling Tool

FCCM '99 Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines
PCI-PipeRench and the SWORDAPI: A System for Stream-Based Reconfigurable Computing

FCCM '99 Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Embedded Compilation for Multimedia Applications

FCCM '00 Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines
Automatic Synthesis of Data Storage and Control Structures for FPGA-Based Computing Engines

FCCM '00 Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines
Tunable Fault Tolerance for Runtime Reconfigurable Architectures

FCCM '00 Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines
Configuration Caching Management Techniques for Reconfigurable Computing

FCCM '00 Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines
Framework for Synthesis of Virtual Pipelines

ASP-DAC '02 Proceedings of the 2002 Asia and South Pacific Design Automation Conference
Efficient Place and Route for Pipeline Reconfigurable Architectures

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements

IEEE Transactions on Computers
A scalable wide-issue clustered VLIW with a reconfigurable interconnect

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Fast and compact sequential circuits for the FPGA-based reconfigurable systems

Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Reconfigurable systems
Custom Wide Counterflow Pipelines for High-Performance Embedded Applications

IEEE Transactions on Computers
The SFRA: a corner-turn FPGA architecture

FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
A reconfigurable unit for a clustered programmable-reconfigurable processor

FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
The design of dynamically reconfigurable datapath coprocessors

ACM Transactions on Embedded Computing Systems (TECS)
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams

Proceedings of the 31st annual international symposium on Computer architecture
Virtual Hardware Byte Code as a Design Platform for Reconfigurable Embedded Systems

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Spatial computation

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Evaluating heuristics in automatically mapping multi-loop applications to FPGAs

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
A Register Allocation Algorithm in the Presence of Scalar Replacement for Fine-Grain Configurable Architectures

Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
A Model-Based Approach for Executable Specifications on Reconfigurable Hardware

Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Run-Time Reconfigurable Systems for Digital Signal Processing Applications: A Survey

Journal of VLSI Signal Processing Systems
Reconfigurable Address Generators for Stream-Based Computation Implemented on FPGAs

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
Domain Specific Reconfigurable Processing Core Architecture for Digital Filtering Applications

Journal of VLSI Signal Processing Systems
Run-time reconfigurable systems for digital signal processing applications: a survey

Journal of VLSI Signal Processing Systems
Reconfigurable Coprocessor for Multimedia Application Domain

Journal of VLSI Signal Processing Systems
Stigmergic approaches applied to flexible fault-tolerant digital VLSI architectures

Journal of Parallel and Distributed Computing - Special issue on parallel bioinspired algorithms
Tartan: evaluating spatial computation for whole program execution

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Increasing hardware efficiency with multifunction loop accelerators

CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
Modulo graph embedding: mapping applications onto coarse-grained reconfigurable architectures

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
A wire delay-tolerant reconfigurable unit for a clustered programmable-reconfigurable processor

Microprocessors & Microsystems
Compiler assisted architectural exploration for coarse grained reconfigurable arrays

Proceedings of the 17th ACM Great Lakes symposium on VLSI
RoSA: a reconfigurable stream-based architecture

Proceedings of the 20th annual conference on Integrated circuits and systems design
Rapid VLIW processor customization for signal processing applications using combinational hardware functions

EURASIP Journal on Applied Signal Processing
Modulo scheduling for highly customized datapaths to increase hardware reusability

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Speedups from extending embedded processors with a high-performance coarse-grained reconfigurable data-path

Journal of Systems Architecture: the EUROMICRO Journal
ARISE Machines: Extending Processors with Hybrid Accelerators

ARC '08 Proceedings of the 4th international workshop on Reconfigurable Computing: Architectures, Tools and Applications
Edge-centric modulo scheduling for coarse-grained reconfigurable architectures

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
A back-end compiler with fast compilation for VLIW based dynamic reconfigurable processor

WSEAS Transactions on Computers
An adaptive compiler method for scheduling and place-and-route for VLIW-based dynamic reconfigurable processor

ICCOMP'08 Proceedings of the 12th WSEAS international conference on Computers
On Simplifying Placement and Routing by Extending Coarse-Grained Reconfigurable Arrays with Omega Networks

ARC '09 Proceedings of the 5th International Workshop on Reconfigurable Computing: Architectures, Tools and Applications
SORU: A Reconfigurable Vector Unit for Adaptable Embedded Systems

ARC '09 Proceedings of the 5th International Workshop on Reconfigurable Computing: Architectures, Tools and Applications
Recurrence cycle aware modulo scheduling for coarse-grained reconfigurable architectures

Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Performance and power of cache-based reconfigurable computing

Proceedings of the 36th annual international symposium on Computer architecture
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation

Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
LP-GALS-C: a new low-power globally asynchronous locally synchronous architecture for symmetric-key cryptography

ISTASC'09 Proceedings of the 9th WSEAS International Conference on Systems Theory and Scientific Computation
CGRA express: accelerating execution using dynamic operation fusion

CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Transmuting coprocessors: dynamic loading of FPGA coprocessors

Proceedings of the 46th Annual Design Automation Conference
Polymorphic pipeline array: a flexible multicore accelerator with virtualized execution for mobile multimedia applications

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Fine-grained vs. coarse-grained shift-and-add arithmetic in FPGAs (abstract only)

Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
Conservation cores: reducing the energy of mature computations

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Compiling for reconfigurable computing: A survey

ACM Computing Surveys (CSUR)
Bridging the gap between compilation and synthesis in the DEFACTO system

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Dynamically managed multithreaded reconfigurable architectures for chip multiprocessors

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Hardware parallelism vs. software parallelism

HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Resource recycling: putting idle resources to work on a composable accelerator

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Dynamically reconfigurable system-on-programmable-chip

EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
An automated development framework for a RISC processor with reconfigurable instruction set extensions

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
ReMAP: A Reconfigurable Heterogeneous Multicore Architecture

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs?

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
A CAD framework for Malibu: an FPGA with time-multiplexed coarse-grained elements

Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
A graph drawing based spatial mapping algorithm for coarse-grained reconfigurable architectures

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
The Instruction-Set Extension Problem: A Survey

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Thread Warping: Dynamic and Transparent Synthesis of Thread Accelerators

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Overview of SMMS-SoC architecture

AEE'05 Proceedings of the 4th WSEAS international conference on Applications of electrical engineering
Performance improvements of microprocessor platforms with a coarse-grained reconfigurable data-path

ICS'06 Proceedings of the 10th WSEAS international conference on Systems
Applying frame layout to hardware design in FPGA for seamless support of cross calls in CPU-FPGA coupling architecture

Microprocessors & Microsystems
Practical and effective domain-specific function unit design for CGRA

ICCSA'11 Proceedings of the 2011 international conference on Computational science and Its applications - Volume Part V
An FPGA-based heterogeneous coarse-grained dynamically reconfigurable architecture

CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Array replication to increase parallelism in applications mapped to configurable architectures

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Rapid Synthesis and Simulation of Computational Circuits in an MPPA

Journal of Signal Processing Systems
EPIMap: using epimorphism to map applications on CGRAs

Proceedings of the 49th Annual Design Automation Conference
MORPHEUS: A heterogeneous dynamically reconfigurable platform for designing highly complex embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
Navigating big data with high-throughput, energy-efficient data partitioning

Proceedings of the 40th Annual International Symposium on Computer Architecture
REGIMap: register-aware application mapping on coarse-grained reconfigurable architectures (CGRAs)

Proceedings of the 50th Annual Design Automation Conference
UNTANGLED: A Game Environment for Discovery of Creative Mapping Strategies

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
SWSL: software synthesis for network lookup

ANCS '13 Proceedings of the ninth ACM/IEEE symposium on Architectures for networking and communications systems
Fast modulo scheduler utilizing patternized routes for coarse-grained reconfigurable architectures

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.01

Visualization

Abstract

Future computing workloads will emphasize an architecture's ability to perform relatively simple calculations on massive quantities of mixed-width data. This paper describes a novel reconfigurable fabric architecture, PipeRench, optimized to accelerate these types of computations. PipeRench enables fast, robust compilers, supports forward compatibility, and virtualizes configurations, thus removing the fixed size constraint present in other fabrics. For the first time we explore how the bit-width of processing elements affects performance and show how the PipeRench architecture has been optimized to balance the needs of the compiler against the realities of silicon. Finally, we demonstrate extreme performance speedup on certain computing kernels (up to 190x versus a modern RISC processor), and analyze how this acceleration translates to application speedup.