Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Combinatorial optimization: algorithms and complexity
Combinatorial optimization: algorithms and complexity
Simulated annealing and Boltzmann machines: a stochastic approach to combinatorial optimization and neural computing
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Improvements to graph coloring register allocation
ACM Transactions on Programming Languages and Systems (TOPLAS)
Memory access coalescing: a technique for eliminating redundant memory accesses
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Exploiting dual data-memory banks in digital signal processors
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Graph theory and its applications
Graph theory and its applications
Enhanced code compression for embedded RISC processors
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Simultaneous reference allocation in code generation for dual data memory bank ASIPs
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
A Framework for Parallelizing Load/Stores on Embedded Processors
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Variable partitioning for dual memory bank DSPs
ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
Hi-index | 0.00 |
Many modern embedded processors such as DSPs support partitioned memory banks (also called X--Y memory or dual-bank memory) along with parallel load/store instructions to achieve higher code density and performance. In order to effectively utilize the parallel load/store instructions, the compiler must partition the memory-resident values and assign them to X or Y bank. This paper gives a postregister allocation solution to merge the generated load/store instructions into their parallel counterparts. Simultaneously, our framework performs allocation of values to X or Y memory banks. We first remove as many load/stores and register--register moves as possible through an excellent iterated coalescing based register allocator by Appel and George [1996]. We then attempt to parallelize the generated load/stores using a multipass approach. The basic phase of our approach attempts the merger of load/stores without duplication and web splitting. We model this problem as a graph-coloring problem in which each value is colored as either X or Y. We then construct a motion scheduling graph (MSG), based on the range of motion for each load/store instruction. MSG reflects potential instructions that could be merged. We propose a notion of pseudofixed boundaries so that the load/store movement is less affected by register dependencies. We prove that the coloring problem for MSG is NP-complete and solve it with two different heuristic algorithms with different complexity. We then propose a two-level iterative process to attempt instruction duplication, variable duplication, web splitting, and local conflict elimination to effectively merge the remaining load/stores. Finally, we clean up some multiple-aliased load/stores. To improve the performance, we combine profiling information with each stage coupled with some modifications to the algorithm. We show that our framework results in parallelization of a large number of load/stores without much growth in data and code segments. The average speedup for our optimization pass reaches roughly 13% if no profile information is available and 17% with profile information. The average code and data segment growth is controlled within 13%.