Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Simulated annealing and Boltzmann machines: a stochastic approach to combinatorial optimization and neural computing
Memory access coalescing: a technique for eliminating redundant memory accesses
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Memory bank and register allocation in software synthesis for ASIPs
ICCAD '95 Proceedings of the 1995 IEEE/ACM international conference on Computer-aided design
Exploiting dual data-memory banks in digital signal processors
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Advanced compiler design and implementation
Advanced compiler design and implementation
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Enhanced code compression for embedded RISC processors
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Simultaneous reference allocation in code generation for dual data memory bank ASIPs
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Optimal spilling for CISC machines with few registers
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Variable partitioning for dual memory bank DSPs
ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
Parallelizing load/stores on dual-bank memory embedded processors
ACM Transactions on Embedded Computing Systems (TECS)
Minimizing bank selection instructions for partitioned memory architecture
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Minimal placement of bank selection instructions for partitioned memory architectures
ACM Transactions on Embedded Computing Systems (TECS)
Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
An Efficient Memory Organization for High-ILP Inner Modem Baseband SDR Processors
Journal of Signal Processing Systems
Journal of Combinatorial Optimization
Hi-index | 0.00 |
Many modern embedded processors (esp. DSPs) support partitioned memory banks (also called X-Y memory or dual bank memory) along with parallel load/store instructions to achieve code density and/or performance. In order to effectively utilize the parallel load/store instructions, the compiler must partition the memory resident values into X or Y bank. This paper gives a post-register allocation solution to merge the generated load/store instructions into their parallel counterparts. Simultaneously, our framework performs allocation of values to X or Y memory banks.We first remove as many load/stores and register-register moves through an excellent iterated coalescing based register allocator by Appel and George [14]. We then attempt to maximally parallelize the generated load/stores using a multi-pass approach with minimalgrowth in terms of memory requirements. The first phase of our approach attempts the merger of load stores without replication of values in memory. We model this problem in terms of a graph coloring problem in which each value is colored X or Y. We then construct a Motion Scheduling Graph (MSG) based on the range of motion for each load/store instruction. MSG reflects potential instructions which could be merged. We propose a notion of pseudo-fixedboundaries so that the load/store movement is minimally affected by register dependencies. We prove that the coloring problem for MSG is NP-complete. We then propose a heuristic solution, which minimally replicates load/stores on different control flow paths if necessary.Finally, the remaining load/stores are tackled by register rematerialization and local conflicts are eliminated. Registers are re-assigned to create motion ranges if opportunities are found for merger which are hindered by local assignment of registers. We show that our framework results in parallelization of a large number of load/stores without much growth in data and code segments.