Advanced compiler design and implementation
Advanced compiler design and implementation
Synthesis and Optimization of Digital Circuits
Synthesis and Optimization of Digital Circuits
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Compilers: Principles, Techniques, and Tools (2nd Edition)
Compilers: Principles, Techniques, and Tools (2nd Edition)
Bit matrix multiplication in commodity processors
ASAP '08 Proceedings of the 2008 International Conference on Application-Specific Systems, Architectures and Processors
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
Small gestures go a long way: how many bits per gesture do recognizers actually need?
Proceedings of the Designing Interactive Systems Conference
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Proceedings of the 10th Workshop on Optimizations for DSP and Embedded Systems
The impact of motion dimensionality and bit cardinality on the design of 3D gesture recognizers
International Journal of Human-Computer Studies
Hi-index | 0.00 |
Automated hardware design from behavior-level abstraction has drawn wide interest in FPGA-based acceleration and configurable computing research field. However, for many high-level programming languages, such as C/C++, the description of bitwise access and computation is not as direct as hardware description languages, and high-level synthesis of algorithmic descriptions may generate suboptimal implementations for bitwise computation-intensive applications. In this paper we introduce a bit-level transformation and optimization approach to assisting high-level synthesis of algorithmic descriptions. We introduce a bit-flow graph to capture bit-value information. Analysis and optimizing transformations can be performed on this representation, and the optimized results are transformed back to the standard data-flow graphs extended with a few instructions representing bitwise access. This allows high-level synthesis tools to automatically generate circuits with higher quality. Experiments show that our algorithm can reduce slice usage by 29.8% on average for a set of real-life benchmarks on Xilinx Virtex-4 FPGAs. In the meantime, the clock period is reduced by 13.6% on average, with an 11.4% latency reduction.