Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Interprocedural dependence analysis and parallelization
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Cache memory design: an art evolves
IEEE Spectrum
The refined-language approach to compiling for parallel supercomputers
The refined-language approach to compiling for parallel supercomputers
Analysis of memory referencing behavior for design of local memories
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A compiler-writer's view of GaAs computer system design
Proceedings of the Twenty-First Annual Hawaii International Conference on Architecture Track
Reduced instruction set computers
Communications of the ACM - Special section on computer architecture
ACM Computing Surveys (CSUR)
Register allocation via usage counts
Communications of the ACM
Register allocation by priority-based coloring
SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Register allocation & spilling via graph coloring
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
MIPS: a VLSI processor architecture
MIPS: a VLSI processor architecture
Dependence analysis for subscripted variables and its application to program transformations
Dependence analysis for subscripted variables and its application to program transformations
Reduced instruction set computer architectures for vlsi (microprocessor, risc, multiple-windows - of - registers)
The effectiveness of caches and data prefetch buffers in large-scale shared memory multiprocessors
The effectiveness of caches and data prefetch buffers in large-scale shared memory multiprocessors
The effectiveness of caches and data prefetch buffers in large-scale shared memory multiprocessors
The effectiveness of caches and data prefetch buffers in large-scale shared memory multiprocessors
Reducing memory traffic with CRegs
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
A limit study of local memory requirements using value reuse profiles
Proceedings of the 28th annual international symposium on Microarchitecture
The intrinsic bandwidth requirements of ordinary programs
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Software assistance for data caches
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Improving power efficiency with compiler-assisted cache replacement
Journal of Embedded Computing - Cache exploitation in embedded systems
Compiler and runtime support for predictive control of power and cooling
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Automated locality optimization based on the reuse distance of string operations
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Hi-index | 0.00 |
In current computer memory system hierarchy, registers and cache are both used to bridge the reference delay gap between the fast processor(s) and the slow main memory. While registers are managed by the compiler using program flow analysis, cache is mainly controlled by hardware without any program understanding. Due to the lack of coordination in managing these two memory structures, significant loss of system performance results because:Cache space is wasted to hold inaccessible copies of values in registers.Inaccessible copies of values replace those accessible ones from cache.Despite the fact that register allocation has long recognized the benefits of live range analysis, current cache management has completely ignored live range information.In this paper, we propose an unified management of registers and cache using liveness and cache bypass. By using a single model to manage these two memory structures, most redundant copies of values in cache can be eliminated. Consequently, bus traffic and memory traffic in data cache are greatly reduced and cache effectiveness is improved.