Input data reuse in compiling window operations onto reconfigurable hardware

Authors:
Zhi Guo;Betul Buyukkurt;Walid Najjar
Affiliations:
University of California Riverside, CA;University of California Riverside, CA;University of California Riverside, CA
Venue:
Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Year:
2004

Citing 14
Cited 26

Digital image processing (2nd ed.)

Digital image processing (2nd ed.)
Discrete-time signal processing

Discrete-time signal processing
Digital video processing

Digital video processing
Evaluation of the streams-C C-to-FPGA compiler: an applications perspective

FPGA '01 Proceedings of the 2001 ACM/SIGDA ninth international symposium on Field programmable gate arrays
Mapping a Single Assignment Programming Language to Reconfigurable Systems

The Journal of Supercomputing
A compiler approach to fast hardware design space exploration in FPGA-based systems

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
The Garp Architecture and C Compiler

Computer
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
Profiling tools for hardware/software partitioning of embedded applications

Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Stream-Oriented FPGA Computing in the Streams-C High Level Language

FCCM '00 Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines
Fast Area Estimation to Support Compiler Optimizations in FPGA-Based Reconfigurable Systems

FCCM '02 Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
High-Level Language Abstraction for Reconfigurable Computing

Computer
Mapping of generalized template matching onto reconfigurable computers

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on the 2001 international conference on computer design (ICCD)
A quantitative analysis of the speedup factors of FPGAs over processors

FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays

Techniques for synthesizing binaries to an advanced register/memory structure

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Optimized Generation of Data-Path from C Codes for FPGAs

Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
New decompilation techniques for binary-level co-processor generation

ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Automatic mapping of nested loops to FPGAS

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Binary synthesis

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Thread warping: a framework for dynamic synthesis of thread accelerators

CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Efficient hardware code generation for FPGAs

ACM Transactions on Architecture and Code Optimization (TACO)
OpenFPGA CoreLib core library interoperability effort

Parallel Computing
CUBA: an architecture for efficient CPU/co-processor data communication

Proceedings of the 22nd annual international conference on Supercomputing
A compiler approach to managing storage and memory bandwidth in configurable architectures

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Traversal caches: a first step towards FPGA acceleration of pointer-based data structures

CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
A compiler intermediate representation for reconfigurable fabrics

International Journal of Parallel Programming
A computing origami: folding streams in FPGAs

Proceedings of the 46th Annual Design Automation Conference
Combining data reuse with data-level parallelization for FPGA-targeted hardware compilation: a geometric programming framework

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Compiling for reconfigurable computing: A survey

ACM Computing Surveys (CSUR)
Partial data reuse for windowing computations: performance modeling for FPGA implementations

ARC'07 Proceedings of the 3rd international conference on Reconfigurable computing: architectures, tools and applications
Optimized generation of memory structure in compiling window operations onto reconfigurable hardware

ARC'07 Proceedings of the 3rd international conference on Reconfigurable computing: architectures, tools and applications
What is hardware/software partitioning?

ACM SIGDA Newsletter
Intermediate fabrics: virtual architectures for circuit portability and fast placement and routing

CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Impact of high-level transformations within the ROCCC framework

ACM Transactions on Architecture and Code Optimization (TACO)
A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications

Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Using memory profile analysis for automatic synthesis of pointers code

ACM Transactions on Embedded Computing Systems (TECS)
System integration of tightly-coupled processor arrays using reconfigurable buffer structures

Proceedings of the ACM International Conference on Computing Frontiers
FPGA code accelerators - the compiler perspective

Proceedings of the 50th Annual Design Automation Conference
Real-time computation of local neighborhood functions in application-specific instruction-set processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Performance modeling for FPGAs: extending the roofline model with high-level synthesis tools

International Journal of Reconfigurable Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Balancing computation with I/O has been considered as a critical factor of the overall performance for embedded systems in general and reconfigurable computing systems in particular. Data I/O often dominates the overall computation performance for window operation, which are frequently used in image processing, image compression, pattern recognition and digital signal processing. This problem is more acute in reconfigurable systems since the compiler must generate the data path and the sequence of operations. The challenge is to intelligently exploit data reuse on the reconfigurable fabric (FPGA) to minimize the required memory or I/O bandwidth while maximizing parallelism.In this paper, we present a compile-time approach to reuse data in window-based codes. The compiler, called ROCCC, first analyzes and optimizes the window operation in C. It then computes the size of the hardware buffer and defines three sets of data values for each window: the window set, the managed set and the killed set. This compile-time analysis simplifies the HDL code generation and improves the resulting hardware performance. We also discuss in-place window operations.