Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Iterative modulo scheduling: an algorithm for software pipelining loops
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
A recursive algorithm for low-power memory partitioning
ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Compile-Time Techniques for Data Distribution in Distributed Memory Machines
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Power Aware Variable Partitioning and Instruction Scheduling for Multiple Memory Banks
Proceedings of the conference on Design, automation and test in Europe - Volume 1
Memory access scheduling and binding considering energy minimization in multi-bank memory systems
Proceedings of the 41st annual Design Automation Conference
Storage assignment during high-level synthesis for configurable architectures
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Lithographic aerial image simulation with FPGA-based hardwareacceleration
Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
A compiler approach to managing storage and memory bandwidth in configurable architectures
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Scheduling with soft constraints
Proceedings of the 2009 International Conference on Computer-Aided Design
Automatic memory partitioning and scheduling for throughput and power optimization
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Towards layout-friendly high-level synthesis
Proceedings of the 2012 ACM international symposium on International Symposium on Physical Design
Memory partitioning and scheduling co-optimization in behavioral synthesis
Proceedings of the International Conference on Computer-Aided Design
Using memory profile analysis for automatic synthesis of pointers code
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
Hardware acceleration is crucial in modern embedded system design to meet the explosive demands on performance and cost. Selected computation kernels for acceleration are usually captured by nest loops, which are optimized by state-of-the-art techniques like loop tiling and loop pipelining. However, memory bandwidth bottlenecks prevent designs to reach optimal throughput with respect to available parallelism. In this paper we present an automatic memory partitioning technique which can efficiently improve throughput and reduce energy consumption of pipelined loop kernels for given throughput constraints and platform requirement. Our partition scheme consists of two steps, the first step considers cycle accurate scheduling information to meet the hard constraints on memory bandwidth requirements specifically for synchronized hardware designs. Experimental results show an average 6X throughput improvement on a set of real world designs with moderate area increase (about 45% on average), given that less resource sharing opportunities exist with higher throughput in optimized designs. The second step further partitions the memory banks for reducing the dynamic power consumption of the final design. In contrast with previous approaches, our technique can statically compute memory access frequencies in polynomial time with little to none profiling. Experimental results show about 30% power reduction on the same set of benchmarks.