Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Constructing the Procedure Call Multigraph
IEEE Transactions on Software Engineering
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Register allocation via graph coloring
Register allocation via graph coloring
The Cydra 5 minisupercomputer: architecture and implementation
The Journal of Supercomputing - Special issue on instruction-level parallelism
Resource allocation in a high clock rate microprocessor
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Compiler optimizations for improving data locality
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
ACM Transactions on Programming Languages and Systems (TOPLAS)
An evaluation of memory consistency models for shared-memory systems with ILP processors
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
A quantitative analysis of loop nest locality
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
A new algorithm for partial redundancy elimination based on SSA form
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Spill code minimization via interference region spilling
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Register promotion in C programs
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Measuring memory hierarchy performance of cache-coherent multiprocessors using micro benchmarks
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Computer Methods for Mathematical Computations
Computer Methods for Mathematical Computations
Cross-Loop Reuse Analysis and Its Application to Cache Optimizations
LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
Proceedings of the 1999 annual ACM SIGAda international conference on Ada
Reconfigurable caches and their application to media processing
Proceedings of the 27th annual international symposium on Computer architecture
Two-level hierarchical register file organization for VLIW processors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Storage allocation for embedded processors
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Loop Restructuring for Data I/O Minimization on Limited On-Chip Memory Embedded Processors
IEEE Transactions on Computers
IEEE Transactions on Computers
A Framework for Parallelizing Load/Stores on Embedded Processors
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Reordering Memory Bus Transactions for Reduced Power Consumption
LCTES '00 Proceedings of the ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems
A Framework for Loop Distribution on Limited On-Chip Memory Processors
CC '00 Proceedings of the 9th International Conference on Compiler Construction
Software Controlled Reconfigurable On-Chip Memory for High Performance Computing
IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
FlexCache: A Framework for Flexible Compiler Generated Data Caching
IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
SCIMA: Software Controlled Integrated Memory Architecture for High Performance Computing
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
SCIMA: A Novel Architecture for High Performance Computing
IWIA '99 Proceedings of the 1999 International Workshop on Innovative Architecture
ACM Transactions on Embedded Computing Systems (TECS)
Data compression for improving SPM behavior
Proceedings of the 41st annual Design Automation Conference
Hardware-managed register allocation for embedded processors
Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Fast, predictable and low energy memory references through architecture-aware compilation
Proceedings of the 2004 Asia and South Pacific Design Automation Conference
BB-GC: Basic-Block Level Garbage Collection
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Differential register allocation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Memory Coloring: A Compiler Approach for Scratchpad Memory Management
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Efficient Use of Invisible Registers in Thumb Code
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Software and hardware techniques to optimize register file utilization in VLIW architectures
International Journal of Parallel Programming
Compiler Optimizations to Reduce Security Overhead
Proceedings of the International Symposium on Code Generation and Optimization
Parallelizing load/stores on dual-bank memory embedded processors
ACM Transactions on Embedded Computing Systems (TECS)
Allocating architected registers through differential encoding
ACM Transactions on Programming Languages and Systems (TOPLAS)
DRDU: A data reuse analysis technique for efficient scratch-pad memory management
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Software controlled memory layout reorganization for irregular array access patterns
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Efficient dynamic heap allocation of scratch-pad memory
Proceedings of the 7th international symposium on Memory management
Compiler driven data layout optimization for regular/irregular array access patterns
Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Access pattern-based code compression for memory-constrained systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Compiler-directed scratchpad memory management via graph coloring
ACM Transactions on Architecture and Code Optimization (TACO)
Adaptive scratch pad memory management for dynamic behavior of multimedia applications
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Using data compression for increasing memory system utilization
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Reducing memory space consumption through dataflow analysis
Computer Languages, Systems and Structures
A compile-time managed multi-level register file hierarchy
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors
ACM Transactions on Computer Systems (TOCS)
Hi-index | 0.01 |
Optimizations aimed at reducing the impact of memory operations on execution speed have long concentrated on improving cache performance. These efforts achieve a. reasonable level of success. The primary limit on the compiler's ability to improve memory behavior is its imperfect knowledge about the run-time behavior of the program. The compiler cannot completely predict runtime access patterns.There is an exception to this rule. During the register allocation phase, the compiler often must insert substantial amounts of spill code; that is, instructions that move values from registers to memory and back again. Because the compiler itself inserts these memory instructions, it has more knowledge about them than other memory operations in the program.Spill-code operations are disjoint from the memory manipulations required by the semantics of the program being compiled, and, indeed, the two can interfere in the cache. This paper proposes a hardware solution to the problem of increased spill costs---a small compiler-controlled memory (CCM) to hold spilled values. This small random-access memory can (and should) be placed in a distinct address space from the main memory hierarchy. The compiler can target spill instructions to use the CCM, moving most compiler-inserted memory traffic out of the pathway to main memory and eliminating any impact that those spill instructions would have on the state of the main memory hierarchy. Such memories already exist on some DSP microprocessors. Our techniques can be applied directly on those chips.This paper presents two compiler-based methods to exploit such a memory, along with experimental results showing that speedups from using CCM may be sizable. It shows that using the register allocation's coloring paradigm to assign spilled values to memory can greatly reduce the amount of memory required by a program.