High-bandwidth data memory systems for superscalar processors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
An architecture for software-controlled data prefetching
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Data prefetching in multiprocessor vector cache memories
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
IMPACT: an architectural framework for multiple-instruction-issue processors
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Data access microarchitectures for superscalar processors with compiler-assisted data prefetching
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
An effective on-chip preloading scheme to reduce data access penalty
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Tolerating data access latency with register preloading
ICS '92 Proceedings of the 6th international conference on Supercomputing
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
A modified approach to data cache management
Proceedings of the 28th annual international symposium on Microarchitecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
ACM Computing Surveys (CSUR)
Software methods for improvement of cache performance on supercomputer applications
Software methods for improvement of cache performance on supercomputer applications
Run-time spatial locality detection and optimization
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Streamlining inter-operation memory communication via data dependence prediction
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Utilizing reuse information in data cache management
ICS '98 Proceedings of the 12th international conference on Supercomputing
Using prediction to accelerate coherence protocols
Proceedings of the 25th annual international symposium on Computer architecture
Exploiting spatial locality in data caches using spatial footprints
Proceedings of the 25th annual international symposium on Computer architecture
Load latency tolerance in dynamically scheduled processors
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Capturing dynamic memory reference behavior with adaptive cache topology
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
A look at several memory management units, TLB-refill mechanisms, and page table organizations
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Power and performance tradeoffs using various caching strategies
ISLPED '98 Proceedings of the 1998 international symposium on Low power electronics and design
Functional Implementation Techniques for CPU Cache Memories
IEEE Transactions on Computers - Special issue on cache memory and related problems
A locality sensitive multi-module cache with explicit management
ICS '99 Proceedings of the 13th international conference on Supercomputing
Using dynamic cache management techniques to reduce energy in a high-performance processor
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Hardware identification of cache conflict misses
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Active Management of Data Caches by Exploiting Reuse Information
IEEE Transactions on Computers
IEEE Transactions on Computers
IEEE Transactions on Computers
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Power-aware partitioned cache architectures
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Runtime identification of cache conflict misses: The adaptive miss buffer
ACM Transactions on Computer Systems (TOCS)
Designing a Modern Memory Hierarchy with Hardware Prefetching
IEEE Transactions on Computers
Increasing power efficiency of multi-core network processors through data filtering
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Splitting the Data Cache: A Survey
IEEE Concurrency
Random-Access Data Storage Components in Customized Architectures
IEEE Design & Test
Partitioned instruction cache architecture for energy efficiency
ACM Transactions on Embedded Computing Systems (TECS)
Compiler-Directed Cache Assist Adaptivity
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
A Power Efficient Cache Structure for Embedded Processors Based on the Dual Cache Structure
LCTES '00 Proceedings of the ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems
Compiler-Directed Cache Line Size Adaptivity
IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
Improving cache hit ratio by extended referencing cache lines
Journal of Computing Sciences in Colleges
Highly accurate and efficient evaluation of randomising set index functions
Journal of Systems Architecture: the EUROMICRO Journal
Optimal Replacement Is NP-Hardfor Nonstandard Caches
IEEE Transactions on Computers
Distance Associativity for High-Performance Energy-Efficient Non-Uniform Cache Architectures
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Self-correcting LRU replacement policies
Proceedings of the 1st conference on Computing frontiers
Reducing traffic generated by conflict misses in caches
Proceedings of the 1st conference on Computing frontiers
An Integrated Approach for Improving Cache Behavior
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
A sample-based cache mapping scheme
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Compilation techniques for energy reduction in horizontally partitioned cache architectures
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
MEDEA '04 Proceedings of the 2004 workshop on MEmory performance: DEaling with Applications , systems and architecture
Journal of Systems Architecture: the EUROMICRO Journal
International Journal of Parallel Programming
Victim management in a cache hierarchy
IBM Journal of Research and Development - Advanced silicon technology
Page mapping for heterogeneously partitioned caches: Complexity and heuristics
Journal of Embedded Computing - Cache exploitation in embedded systems
Improving SDRAM access energy efficiency for low-power embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
A compiler-in-the-loop framework to explore horizontally partitioned cache architectures
Proceedings of the 2008 Asia and South Pacific Design Automation Conference
Capturing and optimizing the interactions between prefetching and cache line turnoff
Microprocessors & Microsystems
Less reused filter: improving l2 cache performance via filtering less reused lines
Proceedings of the 23rd international conference on Supercomputing
SieveStore: a highly-selective, ensemble-level disk cache for cost-performance
Proceedings of the 37th annual international symposium on Computer architecture
Two management approaches of the split data cache in multiprocessor systems
EURO-PDP'00 Proceedings of the 8th Euromicro conference on Parallel and distributed processing
SSD bufferpool extensions for database systems
Proceedings of the VLDB Endowment
Reducing off-chip memory traffic by selective cache management scheme in GPGPUs
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
FLEXclusion: balancing cache capacity and on-chip bandwidth via flexible exclusion
Proceedings of the 39th Annual International Symposium on Computer Architecture
The locality-aware adaptive cache coherence protocol
Proceedings of the 40th Annual International Symposium on Computer Architecture
Hi-index | 0.01 |
Improvements in main memory speeds have not kept pace with increasing processor clock frequency and improved exploitation of instruction-level parallelism. Consequently, the gap between processor and main memory performance is expected to grow, increasing the number of execution cycles spent waiting for memory accesses to complete. One solution to this growing problem is to reduce the number of cache misses by increasing the effectiveness of the cache hierarchy. In this paper we present a technique for dynamic analysis of program data access behavior, which is then used to proactively guide the placement of data within the cache hierarchy in a location-sensitive manner. We introduce the concept of a macroblock, which allows us to feasibly characterize the memory locations accessed by a program, and a Memory Address Table, which performs the dynamic reference analysis. Our technique is fully compatible with existing Instruction Set Architectures. Results from detailed simulations of several integer programs show significant speedups.