Digital image processing (2nd ed.)
Digital image processing (2nd ed.)
Discrete-time signal processing
Discrete-time signal processing
Digital video processing
Evaluation of the streams-C C-to-FPGA compiler: an applications perspective
FPGA '01 Proceedings of the 2001 ACM/SIGDA ninth international symposium on Field programmable gate arrays
Mapping a Single Assignment Programming Language to Reconfigurable Systems
The Journal of Supercomputing
A compiler approach to fast hardware design space exploration in FPGA-based systems
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
The Garp Architecture and C Compiler
Computer
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
Profiling tools for hardware/software partitioning of embedded applications
Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Stream-Oriented FPGA Computing in the Streams-C High Level Language
FCCM '00 Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines
Fast Area Estimation to Support Compiler Optimizations in FPGA-Based Reconfigurable Systems
FCCM '02 Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Mapping of generalized template matching onto reconfigurable computers
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on the 2001 international conference on computer design (ICCD)
A quantitative analysis of the speedup factors of FPGAs over processors
FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
Techniques for synthesizing binaries to an advanced register/memory structure
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Optimized Generation of Data-Path from C Codes for FPGAs
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
New decompilation techniques for binary-level co-processor generation
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Automatic mapping of nested loops to FPGAS
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Thread warping: a framework for dynamic synthesis of thread accelerators
CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Efficient hardware code generation for FPGAs
ACM Transactions on Architecture and Code Optimization (TACO)
OpenFPGA CoreLib core library interoperability effort
Parallel Computing
CUBA: an architecture for efficient CPU/co-processor data communication
Proceedings of the 22nd annual international conference on Supercomputing
A compiler approach to managing storage and memory bandwidth in configurable architectures
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Traversal caches: a first step towards FPGA acceleration of pointer-based data structures
CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
A compiler intermediate representation for reconfigurable fabrics
International Journal of Parallel Programming
A computing origami: folding streams in FPGAs
Proceedings of the 46th Annual Design Automation Conference
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Compiling for reconfigurable computing: A survey
ACM Computing Surveys (CSUR)
Partial data reuse for windowing computations: performance modeling for FPGA implementations
ARC'07 Proceedings of the 3rd international conference on Reconfigurable computing: architectures, tools and applications
Optimized generation of memory structure in compiling window operations onto reconfigurable hardware
ARC'07 Proceedings of the 3rd international conference on Reconfigurable computing: architectures, tools and applications
What is hardware/software partitioning?
ACM SIGDA Newsletter
Intermediate fabrics: virtual architectures for circuit portability and fast placement and routing
CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Impact of high-level transformations within the ROCCC framework
ACM Transactions on Architecture and Code Optimization (TACO)
A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Using memory profile analysis for automatic synthesis of pointers code
ACM Transactions on Embedded Computing Systems (TECS)
System integration of tightly-coupled processor arrays using reconfigurable buffer structures
Proceedings of the ACM International Conference on Computing Frontiers
FPGA code accelerators - the compiler perspective
Proceedings of the 50th Annual Design Automation Conference
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Performance modeling for FPGAs: extending the roofline model with high-level synthesis tools
International Journal of Reconfigurable Computing
Hi-index | 0.00 |
Balancing computation with I/O has been considered as a critical factor of the overall performance for embedded systems in general and reconfigurable computing systems in particular. Data I/O often dominates the overall computation performance for window operation, which are frequently used in image processing, image compression, pattern recognition and digital signal processing. This problem is more acute in reconfigurable systems since the compiler must generate the data path and the sequence of operations. The challenge is to intelligently exploit data reuse on the reconfigurable fabric (FPGA) to minimize the required memory or I/O bandwidth while maximizing parallelism.In this paper, we present a compile-time approach to reuse data in window-based codes. The compiler, called ROCCC, first analyzes and optimizes the window operation in C. It then computes the size of the hardware buffer and defines three sets of data values for each window: the window set, the managed set and the killed set. This compile-time analysis simplifies the HDL code generation and improves the resulting hardware performance. We also discuss in-place window operations.