A Framework for Parallelizing Load/Stores on Embedded Processors

Authors:
Xiaotong Zhuang;Santosh Pande;John S. Greenland, Jr.
Affiliations:
-;-;-
Venue:
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Year:
2002

Citing 13
Cited 6

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
Simulated annealing and Boltzmann machines: a stochastic approach to combinatorial optimization and neural computing

Simulated annealing and Boltzmann machines: a stochastic approach to combinatorial optimization and neural computing
Memory access coalescing: a technique for eliminating redundant memory accesses

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Memory bank and register allocation in software synthesis for ASIPs

ICCAD '95 Proceedings of the 1995 IEEE/ACM international conference on Computer-aided design
Exploiting dual data-memory banks in digital signal processors

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Iterated register coalescing

POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Advanced compiler design and implementation

Advanced compiler design and implementation
Compiler-controlled memory

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Enhanced code compression for embedded RISC processors

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Simultaneous reference allocation in code generation for dual data memory bank ASIPs

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Optimal spilling for CISC machines with few registers

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Efficient register and memory assignment for non-orthogonal architectures via graph coloring and MST algorithms

Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Variable partitioning for dual memory bank DSPs

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02

Parallelizing load/stores on dual-bank memory embedded processors

ACM Transactions on Embedded Computing Systems (TECS)
Minimizing bank selection instructions for partitioned memory architecture

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Minimal placement of bank selection instructions for partitioned memory architectures

ACM Transactions on Embedded Computing Systems (TECS)
Analysis and approximation for bank selection instruction minimization on partitioned memory architecture

Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
An Efficient Memory Organization for High-ILP Inner Modem Baseband SDR Processors

Journal of Signal Processing Systems
Analysis and approximation for bank selection instruction minimization on partitioned memory architecture

Journal of Combinatorial Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many modern embedded processors (esp. DSPs) support partitioned memory banks (also called X-Y memory or dual bank memory) along with parallel load/store instructions to achieve code density and/or performance. In order to effectively utilize the parallel load/store instructions, the compiler must partition the memory resident values into X or Y bank. This paper gives a post-register allocation solution to merge the generated load/store instructions into their parallel counterparts. Simultaneously, our framework performs allocation of values to X or Y memory banks.We first remove as many load/stores and register-register moves through an excellent iterated coalescing based register allocator by Appel and George [14]. We then attempt to maximally parallelize the generated load/stores using a multi-pass approach with minimalgrowth in terms of memory requirements. The first phase of our approach attempts the merger of load stores without replication of values in memory. We model this problem in terms of a graph coloring problem in which each value is colored X or Y. We then construct a Motion Scheduling Graph (MSG) based on the range of motion for each load/store instruction. MSG reflects potential instructions which could be merged. We propose a notion of pseudo-fixedboundaries so that the load/store movement is minimally affected by register dependencies. We prove that the coloring problem for MSG is NP-complete. We then propose a heuristic solution, which minimally replicates load/stores on different control flow paths if necessary.Finally, the remaining load/stores are tackled by register rematerialization and local conflicts are eliminated. Registers are re-assigned to create motion ranges if opportunities are found for merger which are hindered by local assignment of registers. We show that our framework results in parallelization of a large number of load/stores without much growth in data and code segments.