Automatic memory partitioning: increasing memory parallelism via data structure partitioning

Authors:
Yosi Ben-Asher;Nadav Rotem
Affiliations:
Haida University, Haifa, Israel;Haifa University, Haifa, Israel
Venue:
CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Year:
2010

Citing 18
Cited 4

Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Logic synthesis

Logic synthesis
Data and memory optimization techniques for embedded systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
An efficient profile-analysis framework for data-layout optimizations

POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Some simplified NP-complete problems

STOC '74 Proceedings of the sixth annual ACM symposium on Theory of computing
Memory allocation and mapping in high-level synthesis: an integrated approach

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low power
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Techniques for synthesizing binaries to an advanced register/memory structure

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Automatic pool allocation: improving performance by controlling data structure layout in the heap

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Whole execution traces and their applications

ACM Transactions on Architecture and Code Optimization (TACO)
Memory access pattern analysis and stream cache design for multimedia applications

ASP-DAC '03 Proceedings of the 2003 Asia and South Pacific Design Automation Conference
METRIC: Memory tracing via dynamic binary rewriting to identify cache inefficiencies

ACM Transactions on Programming Languages and Systems (TOPLAS)
Valgrind: a framework for heavyweight dynamic binary instrumentation

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Forma: A framework for safe automatic array reshaping

ACM Transactions on Programming Languages and Systems (TOPLAS)
Automatic On-chip Memory Minimization for Data Reuse

FCCM '07 Proceedings of the 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
MPADS: memory-pooling-assisted data splitting

Proceedings of the 7th international symposium on Memory management
Compilation Techniques for Reconfigurable Architectures

Compilation Techniques for Reconfigurable Architectures
Pipeline vectorization

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Using memory profile analysis for automatic synthesis of pointers code

ACM Transactions on Embedded Computing Systems (TECS)
Quipu: A Statistical Model for Predicting Hardware Resources

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Memory partitioning for multidimensional arrays in high-level synthesis

Proceedings of the 50th Annual Design Automation Conference
The benefits of using variable-length pipelined operations in high-level synthesis

ACM Transactions on Embedded Computing Systems (TECS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In high-level synthesis, pipelined designs are often restricted by the number of memory banks available to the synthesis system. Using multiple memory banks can improve the performance of accelerated applications. Currently, programmers must manually assign data structures to specific memory banks on the accelerator. This paper describes Automatic Memory Partitioning, a method for automatically partitioning data structures into multiple memory banks for increased parallelism and performance. We use source code instrumentation to collect memory traces in order to detect linear memory access patterns. The memory traces are used to split data structures into disjoint memory regions and determine which segments may benefit from parallel memory access. We present an ILP based algorithm for allocating memory segments into multiple memory banks. Experiments show significant improvements in performance while using a minimal number of memory banks.