Parallelizing Applications into Silicon

Authors:
Jonathan Babb;Martin Rinard;Csaba Andras Moritz;Walter Lee;Matthew Frank;Rajeev Barua;Saman Amarasinghe
Affiliations:
-;-;-;-;-;-;-
Venue:
FCCM '99 Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Year:
1999

Citing 14
Cited 34

Micro-optimization of floating-point operations

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Building and Using a Highly Parallel Programmable Logic Array

Computer - Special issue on experimental research in computer architecture
Data-parallel C on a reconfigurable logic array

The Journal of Supercomputing - Special issue on field programmable gate arrays
Programmable active memories: reconfigurable systems come of age

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Shasta: a low overhead, software-only approach for supporting fine-grain shared memory

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
An integrated compile-time/run-time software distributed shared memory system

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Space-time scheduling of instruction-level parallelism on a raw machine

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Maximizing Multiprocessor Performance with the SUIF Compiler

Computer
Scalable Processors in the Billion-Transistor Era: IRAM

Computer
Baring It All to Software: Raw Machines

Computer
Compiling Ruby into FPGAs

FPL '95 Proceedings of the 5th International Workshop on Field-Programmable Logic and Applications
NAPA C: Compiling for a Hybrid RISC/FPGA Architecture

FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
Maps: a Compiler-Managed Memory System for RAW Machines

Maps: a Compiler-Managed Memory System for RAW Machines
Logic emulation with virtual wires

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Bidwidth analysis with application to silicon compilation

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Attacking the semantic gap between application programming languages and configurable hardware

FPGA '01 Proceedings of the 2001 ACM/SIGDA ninth international symposium on Field programmable gate arrays
Precision and error analysis of MATLAB applications during automated hardware synthesis for FPGAs

Proceedings of the conference on Design, automation and test in Europe
Reconfigurable computing: a survey of systems and software

ACM Computing Surveys (CSUR)
A compiler approach to fast hardware design space exploration in FPGA-based systems

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
The architecture of the DIVA processing-in-memory chip

ICS '02 Proceedings of the 16th international conference on Supercomputing
Pointer analysis for structured parallel programs

ACM Transactions on Programming Languages and Systems (TOPLAS)
Reconfigurable Computing for Digital Signal Processing: A Survey

Journal of VLSI Signal Processing Systems
Synthesis of operation-centric hardware descriptions

Proceedings of the 2000 IEEE/ACM international conference on Computer-aided design
A system for synthesizing optimized FPGA hardware from MATLAB

Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
BitValue Inference: Detecting and Exploiting Narrow Bitwidth Computations

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
On Availability of Bit-Narrow Operations in General-Purpose Applications

FPL '00 Proceedings of the The Roadmap to Reconfigurable Computing, 10th International Workshop on Field-Programmable Logic and Applications
Compilation Increasing the Scheduling Scope for Multi-memory-FPGA-Based Custom Computing Machines

FPL '01 Proceedings of the 11th International Conference on Field-Programmable Logic and Applications
Compiling Application-Specific Hardware

FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
FlexCache: A Framework for Flexible Compiler Generated Data Caching

IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
Compilation for FPGA-Based Reconfigurable Hardware

IEEE Design & Test
Symbolic NFA scheduling of a RISC microprocessor

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Molecular electronics: devices, systems and tools for gigagate, gigabit chips

Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design
Data communication estimation and reduction for reconfigurable systems

Proceedings of the 40th annual Design Automation Conference
Custom Data Layout for Memory Parallelism

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
An algorithm for converting floating-point computations to fixed-point in MATLAB based FPGA design

Proceedings of the 41st annual Design Automation Conference
Spatial computation

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
An architecture and compiler for scalable on-chip communication

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Evaluating heuristics in automatically mapping multi-loop applications to FPGAs

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
A Prototype Processing-In-Memory (PIM) Chip for the Data-Intensive Architecture (DIVA) System

Journal of VLSI Signal Processing Systems
Bandwidth Management with a Reconfigurable Data Cache

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
An Algorithm for Trading Off Quantization Error with Hardware Resources for MATLAB-Based FPGA Design

IEEE Transactions on Computers
Cost Sensitive Modulo Scheduling in a Loop Accelerator Synthesis System

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Mapping streaming architectures on reconfigurable platforms

ACM SIGARCH Computer Architecture News - Special issue on the 2006 reconfigurable and adaptive architecture workshop
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation

Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
Modern development methods and tools for embedded reconfigurable systems: A survey

Integration, the VLSI Journal
Compiling for reconfigurable computing: A survey

ACM Computing Surveys (CSUR)
Bridging the gap between compilation and synthesis in the DEFACTO system

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Fast hardware compilation of behaviors into an FPGA-based dynamic reconfigurable computing system

SBCCI'99 Proceedings of the XIIth conference on Integrated circuits and systems design

Quantified Score

Hi-index	0.00

Visualization

Abstract

The next decade of computing will be dominated by embedded systems, information appliances and application-specific computers. In order to build these systems, designers will need high-level compilation and CAD tools that generate architectures that effectively meet the needs of each application. In this paper we present a novel compilation system that allows sequential programs, written in C or FORTRAN, to be compiled directly into custom silicon or reconfigurable architectures. This capability is also interesting because trends in computer architecture are moving towards more reconfigurable hardware-like substrates, such as FPGA based systems. Our system works by successfully combining two resource-efficient computing disciplines: Small Memories and Virtual Wires.For a given application, the compiler first analyzes the memory access patterns of pointers and arrays in the program and constructs a partitioned memory system made up of many small memories. The computation is implemented by active computing elements that are spatially distributed within the memory array. A space-time scheduler assigns instructions to the computing elements in a way that maximizes locality and minimizes physical communication distance. It also generates an efficient static schedule for the interconnect. Finally, specialized hardware for the resulting schedule of memory accesses, wires, and computation is generated as a multi-process state machine in synthesizable Verilog.With this system, implemented as a set of SUIF compiler passes, we have successfully compiled programs into hardware and achieve specialization performance enhancements by up to an order of magnitude versus a single general- purpose processor. We also achieve additional parallelization speedups similar to those obtainable using a tightly- interconnected multiprocessor.