Minimal placement of bank selection instructions for partitioned memory architectures

Authors:
Bernhard Scholz;Bernd Burgstaller;Jingling Xue
Affiliations:
The University of Sydney, Sydney, Australia;Yonsei University, Seoul, Korea;University of New South Wales, Sydney, Australia
Venue:
ACM Transactions on Embedded Computing Systems (TECS)
Year:
2008

Citing 29
Cited 7

Register connection: a new approach to adding registers into instruction set architectures

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Optimal code motion: theory and practice

ACM Transactions on Programming Languages and Systems (TOPLAS)
Memory bank and register allocation in software synthesis for ASIPs

ICCAD '95 Proceedings of the 1995 IEEE/ACM international conference on Computer-aided design
Exploiting dual data-memory banks in digital signal processors

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Advanced compiler design and implementation

Advanced compiler design and implementation
Effective cluster assignment for modulo scheduling

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Energy-oriented compiler optimizations for partitioned memory architectures

CASES '00 Proceedings of the 2000 international conference on Compilers, architecture, and synthesis for embedded systems
Data and memory optimization techniques for embedded systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Register allocation for irregular architectures

Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Variable Partitioning and Scheduling of Multiple Memory Architectures for DSP

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A Framework for Parallelizing Load/Stores on Embedded Processors

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Scratchpad memory: design alternative for cache on-chip memory in embedded systems

Proceedings of the tenth international symposium on Hardware/software codesign
Optimal and efficient speculation-based partial redundancy elimination

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Approximation Algorithms for Classification Problems with Pairwise Relationships: Metric Labeling and Markov Random Fields

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Compiler-decided dynamic memory allocation for scratch-pad based embedded systems

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Fast memory bank assignment for fixed-point digital signal processors

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Cache-Aware Scratchpad Allocation Algorithm

Proceedings of the conference on Design, automation and test in Europe - Volume 2
Optimizing for space and time usage with speculative partial redundancy elimination

Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
An Ultra Low Power System Architecture for Sensor Network Applications

Proceedings of the 32nd annual international symposium on Computer Architecture
Partitioning Variables across Register Windows to Reduce Spill Code in a Low-Power Processor

IEEE Transactions on Computers
A second-generation sensor network processor with application-driven memory optimizations and out-of-order execution

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Memory Coloring: A Compiler Approach for Scratchpad Memory Management

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Architecture and circuit techniques for low-throughput, energy-constrained systems across technology generations

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Variable partitioning for dual memory bank DSPs

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
MiDataSets: creating the conditions for a more realistic evaluation of Iterative optimization

HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Computer Systems: A Programmer's Perspective

Computer Systems: A Programmer's Perspective
Nearly optimal register allocation with PBQP

JMLC'06 Proceedings of the 7th joint conference on Modular Programming Languages

Analysis and approximation for bank selection instruction minimization on partitioned memory architecture

Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
Joint variable partitioning and bank selection instruction optimization on embedded systems with multiple memory banks

Proceedings of the 2010 Asia and South Pacific Design Automation Conference
Analysis and approximation for bank selection instruction minimization on partitioned memory architecture

Journal of Combinatorial Optimization
mTags: augmenting microkernel messages with lightweight metadata

ACM SIGOPS Operating Systems Review
Joint variable partitioning and bank selection instruction optimization for partitioned memory architectures

ACM Transactions on Embedded Computing Systems (TECS)
Optimal placement of bank selection instructions in polynomial time

Proceedings of the 16th International Workshop on Software and Compilers for Embedded Systems
Minimizing code size via page selection optimization on partitioned memory architectures

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We have devised an algorithm for minimal placement of bank selections in partitioned memory architectures. This algorithm is parameterizable for a chosen metric, such as speed, space, or energy. Bank switching is a technique that increases the code and data memory in microcontrollers without extending the address buses. Given a program in which variables have been assigned to data banks, we present a novel optimization technique that minimizes the overhead of bank switching through cost-effective placement of bank selection instructions. The placement is controlled by a number of different objectives, such as runtime, low power, small code size or a combination of these parameters. We have formulated the minimal placement of bank selection instructions as a discrete optimization problem that is mapped to a partitioned boolean quadratic programming (PBQP) problem. We implemented the optimization as part of a PIC Microchip backend and evaluated the approach for several optimization objectives. Our benchmark suite comprises programs from MiBench and DSPStone plus a microcontroller real-time kernel and drivers for microcontroller hardware devices. Our optimization achieved a reduction in program memory space of between 2.7 and 18.2&percent;, and an overall improvement with respect to instruction cycles between 5.0 and 28.8&percent;. Our optimization achieved the minimal solution for all benchmark programs. We investigated the scalability of our approach toward the requirements of future generations of microcontrollers. This study was conducted as a worst-case analysis on the entire MiBench suite. Our results show that our optimization (1) scales well to larger numbers of memory banks, (2) scales well to the larger problem sizes that will become feasible with future microcontrollers, and (3) achieves minimal placement for more than 72&percent; of all functions from MiBench.