Memory partitioning and scheduling co-optimization in behavioral synthesis

Authors:
Peng Li;Yuxin Wang;Peng Zhang;Guojie Luo;Tao Wang;Jason Cong
Affiliations:
Peking University, Beijing, China;Peking University, Beijing, China;University of California, Los Angeles, CA;Peking University, Beijing, China;Peking University, Beijing, China and UCLA/PKU Joint Research Institute in Science and Engineering;Peking University, Beijing, China and University of California, Los Angeles, CA and UCLA/PKU Joint Research Institute in Science and Engineering
Venue:
Proceedings of the International Conference on Computer-Aided Design
Year:
2012

Citing 10
Cited 3

Global optimizations for parallelism and locality on scalable parallel machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Compile-Time Techniques for Data Distribution in Distributed Memory Machines

IEEE Transactions on Parallel and Distributed Systems
Automatic Partitioning of Parallel Loops and Data Arrays for Distributed Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
R. Barua, W. Lee, S. Amarasinghe and A. Agarwal

HIPC '98 Proceedings of the Fifth International Conference on High Performance Computing
Automatic memory partitioning and scheduling for throughput and power optimization

Proceedings of the 2009 International Conference on Computer-Aided Design
Automatic memory partitioning and scheduling for throughput and power optimization

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Customizable Domain-Specific Computing

IEEE Design & Test
High-Level Synthesis for FPGAs: From Prototyping to Deployment

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Improving high level synthesis optimization opportunity through polyhedral transformations

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Memory partitioning for multidimensional arrays in high-level synthesis

Proceedings of the 50th Annual Design Automation Conference
Theory and algorithm for generalized memory partitioning in high-level synthesis

Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays

Quantified Score

Hi-index	0.00

Visualization

Abstract

Achieving optimal throughput by extracting parallelism in behavioral synthesis often exaggerates memory bottleneck issues. Data partitioning is an important technique for increasing memory bandwidth by scheduling multiple simultaneous memory accesses to different memory banks. In this paper we present a vertical memory partitioning and scheduling algorithm that can generate a valid partition scheme for arbitrary affine memory inputs. It does this by arranging non-conflicting memory accesses across the border of loop iterations. A mixed memory partitioning and scheduling algorithm is also proposed to combine the advantages of the vertical and other state-of-art algorithms. A set of theorems is provided as criteria for selecting a valid partitioning scheme. This is followed by an optimal and scalable memory scheduling algorithm. By utilizing the property of constant strides between memory addresses in successive loop iterations, an address translation optimization technique for an arbitrary partition factor is proposed to improve performance, area and energy efficiency. Experimental results show that on a set of real-world medical image processing kernels, the proposed mixed algorithm with address translation optimization can gain speed-up, area reduction and power savings of 15.8%, 36% and 32.4% respectively, compared to the state-of-art memory partitioning algorithm.