Compiler-based prefetching for recursive data structures
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Data and memory optimization techniques for embedded systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Cache-efficient memory layout of aggregate data structures
Proceedings of the 14th international symposium on Systems synthesis
An efficient profile-analysis framework for data-layout optimizations
POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Synthesis of hardware models in C with pointers and complex data structures
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - System Level Design
Flow Graph Balancing for Minimizing the Required Memory Bandwidth
ISSS '96 Proceedings of the 9th international symposium on System synthesis
Memory allocation and mapping in high-level synthesis: an integrated approach
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low power
Exposing Memory Access Regularities Using Object-Relative Memory Profiling
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Input data reuse in compiling window operations onto reconfigurable hardware
Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Techniques for synthesizing binaries to an advanced register/memory structure
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Optimized Generation of Data-Path from C Codes for FPGAs
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Automatic pool allocation: improving performance by controlling data structure layout in the heap
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Whole execution traces and their applications
ACM Transactions on Architecture and Code Optimization (TACO)
Memory access pattern analysis and stream cache design for multimedia applications
ASP-DAC '03 Proceedings of the 2003 Asia and South Pacific Design Automation Conference
METRIC: Memory tracing via dynamic binary rewriting to identify cache inefficiencies
ACM Transactions on Programming Languages and Systems (TOPLAS)
Valgrind: a framework for heavyweight dynamic binary instrumentation
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Forma: A framework for safe automatic array reshaping
ACM Transactions on Programming Languages and Systems (TOPLAS)
Automatic On-chip Memory Minimization for Data Reuse
FCCM '07 Proceedings of the 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
MPADS: memory-pooling-assisted data splitting
Proceedings of the 7th international symposium on Memory management
Compilation Techniques for Reconfigurable Architectures
Compilation Techniques for Reconfigurable Architectures
Compiler-directed scratchpad memory management via graph coloring
ACM Transactions on Architecture and Code Optimization (TACO)
Automatic memory partitioning and scheduling for throughput and power optimization
Proceedings of the 2009 International Conference on Computer-Aided Design
Automatic memory partitioning: increasing memory parallelism via data structure partitioning
CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Hi-index | 0.00 |
One of the main advantages of high-level synthesis (HLS) is the ability to synthesize circuits that can access multiple memory banks in parallel. Current HLS systems synthesize parallel memory references based on explicit array declarations in the source code. We consider the need to synthesize not only array references but also memory operations targeting pointers and dynamic data structures. This paper describes Automatic Memory Partitioning, a method for automatically synthesizing general data structures (arrays and pointers) into multiple memory banks for increased parallelism and performance. We use source code instrumentation to collect memory traces in order to detect linear memory access patterns. The memory traces are used to split data structures into disjoint memory regions and determine which segments may benefit from parallel memory access. We present an algorithm for allocating memory segments into multiple memory banks. Experiments show significant improvements in performance while conserving the number of memory banks.