Improving the ratio of memory operations to floating-point operations in loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
ACM Computing Surveys (CSUR)
EURO-DAC '96/EURO-VHDL '96 Proceedings of the conference on European design automation
Data and memory optimization techniques for embedded systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
A compiler approach to fast hardware design space exploration in FPGA-based systems
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Optimized Unrolling of Nested Loops
International Journal of Parallel Programming
Loop Shifting for Loop Compaction
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Memory Reuse Analysis in the Polyhedral Model
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I
Compiler-directed customization of ASIP cores
Proceedings of the tenth international symposium on Hardware/software codesign
Profiling tools for hardware/software partitioning of embedded applications
Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Gprof: A call graph execution profiler
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Pipelined Fast 2-D DCT Architecture for JPEG Image Compression
Proceedings of the 14th symposium on Integrated circuits and systems design
Application-specific instruction generation for configurable processor architectures
FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Rapid Configuration and Instruction Selection for an ASIP: A Case Study
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Scalable custom instructions identification for instruction-set extensible processors
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
IEEE Transactions on Computers
Instruction set extension with shadow registers for configurable processors
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Fine-grained application source code profiling for ASIP design
Proceedings of the 42nd annual Design Automation Conference
Interprocedural parallelization analysis in SUIF
ACM Transactions on Programming Languages and Systems (TOPLAS)
An integer linear programming approach for identifying instruction-set extensions
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Cost Sensitive Modulo Scheduling in a Loop Accelerator Synthesis System
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies
International Journal of Parallel Programming
Loop Transformation Methodologies for Array-Oriented Memory Management
ASAP '06 Proceedings of the IEEE 17th International Conference on Application-specific Systems, Architectures and Processors
Computer Architecture, Fourth Edition: A Quantitative Approach
Computer Architecture, Fourth Edition: A Quantitative Approach
Supporting multiple-input, multiple-output custom functions in configurable processors
Journal of Systems Architecture: the EUROMICRO Journal
Identifying potential parallelism via loop-centric profiling
Proceedings of the 4th international conference on Computing frontiers
Journal of VLSI Signal Processing Systems
The Instruction-Set Extension Problem: A Survey
ARC '08 Proceedings of the 4th international workshop on Reconfigurable Computing: Architectures, Tools and Applications
Loop-oriented metrics for exploring an application-specific architecture design-space
ASAP '08 Proceedings of the 2008 International Conference on Application-Specific Systems, Architectures and Processors
Rapid estimation of instruction cache hit rates using loop profiling
ASAP '08 Proceedings of the 2008 International Conference on Application-Specific Systems, Architectures and Processors
Fast custom instruction identification by convex subgraph enumeration
ASAP '08 Proceedings of the 2008 International Conference on Application-Specific Systems, Architectures and Processors
SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
Motion adaptive interpolation with horizontal motion detection for deinterlacing
IEEE Transactions on Consumer Electronics
Exact and approximate algorithms for the extension of embedded processor instruction sets
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Integrated Memory Controllers with Parallel Coherence Streams
IEEE Transactions on Parallel and Distributed Systems
Hi-index | 0.00 |
Design space exploration is a delicate process whose success lays on the designers' shoulders. It is often based on a trial-and-error approach. Some basic metrics can be used to guide this process. In this paper, we explore accelerating loops from C-based specifications. We built a framework in which a design style, such as software-oriented or application-specific instruction-set processor (ASIP)-oriented design, can be specified. We also propose an exploration process that allows targeting the main aspects that limit acceleration and the actions that can be made to improve it. The process is based on new loop-oriented metrics that provide insight in key design issues. They help to determine which aspects of the design between data accesses and arithmetic logic unit (ALU)/control operations limit or allow leveraging loop acceleration opportunities. We profile some benchmarks from the signal and image processing fields, such as the Turbo Decoder and the JPEG algorithms, to illustrate how loop-oriented metrics help to point out aspects that limit or improve loop acceleration. The loop acceleration process was also used to explore design architectures that can leverage, as much as possible, the loop acceleration opportunities of the sum of absolute differences (SAD) algorithm.