Augmenting Loop Tiling with Data Alignment for Improved Cache Performance
IEEE Transactions on Computers - Special issue on cache memory and related problems
Loop tiling for parallelism
Data and memory optimization techniques for embedded systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Synthesizing Transformations for Locality Enhancement of Imperfectly-Nested Loop Nests
International Journal of Parallel Programming
PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators
Journal of VLSI Signal Processing Systems
Projection of the Array-OL Specification Language onto the Kahn Process Network Computation Model
ISPAN '05 Proceedings of the 8th International Symposium on Parallel Architectures,Algorithms and Networks
Fast, accurate design space exploration of embedded systems memory configurations
Proceedings of the 2007 ACM symposium on Applied computing
Efficient design space exploration for application specific systems-on-a-chip
Journal of Systems Architecture: the EUROMICRO Journal
Incremental hierarchical memory size estimation for steering of loop transformations
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Computer
Defect Analysis and Defect Tolerant Design of Multi-port SRAMs
Journal of Electronic Testing: Theory and Applications
Hiding I/O latency with pre-execution prefetching for parallel applications
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
High Level Loop Transformations for Systematic Signal Processing Embedded Applications
SAMOS '08 Proceedings of the 8th international workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Journal of Signal Processing Systems
Parallelization Approaches for Hardware Accelerators --- Loop Unrolling Versus Loop Partitioning
ARCS '09 Proceedings of the 22nd International Conference on Architecture of Computing Systems
An anytime algorithm for optimal coalition structure generation
Journal of Artificial Intelligence Research
Modeling visual perception for image processing
IWANN'07 Proceedings of the 9th international work conference on Artificial neural networks
Multidimensional Systems and Signal Processing
A Model-Driven Design Framework for Massively Parallel Embedded Systems
ACM Transactions on Embedded Computing Systems (TECS)
Journal of Systems Architecture: the EUROMICRO Journal
ASAM: Automatic architecture synthesis and application mapping
Microprocessors & Microsystems
Hi-index | 0.00 |
Due to the complexity of modern data parallel applications such as image processing applications, automatic approach to infer suitable and efficient hardware realizations are more and more required. Typically, the optimization of data transfer and storage micro-architecture has a key role for the data parallelism. In this paper, we propose a comprehensive method to explore the mapping of a high-level representation of an application into a customizable hardware accelerator. The highlevel representation is in a language called Array-OL. The customizable architecture uses FIFO queues and double buffering mechanism to mask the latency of data transfers and external memory access. The mapping of a high-level representation onto the given architecture is performed by applying a set of loop transformations in Array-OL. A method based on integer partition is used to reduce the space of explored solutions.