Self-adjusting binary search trees
Journal of the ACM (JACM)
High-performance computer architecture
High-performance computer architecture
Analysis of cache performance for operating systems and multiprogramming
Analysis of cache performance for operating systems and multiprogramming
Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Program optimization for instruction caches
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
IEEE Transactions on Computers
Achieving high instruction cache performance with an optimizing compiler
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Evaluating Associativity in CPU Caches
IEEE Transactions on Computers
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
The Design of a Microsupercomputer
Computer - Special issue on experimental research in computer architecture
Implementing stack simulation for highly-associative memories
SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Program analysis and optimization for machines with instruction cache
Program analysis and optimization for machines with instruction cache
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Aspects of Cache Memory and Instruction
Aspects of Cache Memory and Instruction
Efficient analysis of caching systems
Efficient analysis of caching systems
Set-associative cache simulation using generalized binomial trees
ACM Transactions on Computer Systems (TOCS)
Memory bandwidth limitations of future microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
A quantitative analysis of loop nest locality
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Trading conflict and capacity aliasing in conditional branch predictors
Proceedings of the 24th annual international symposium on Computer architecture
Investigating optimal local memory performance
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
An Algorithm for Optimally Exploiting Spatial and Temporal Locality in Upper Memory Levels
IEEE Transactions on Computers - Special issue on cache memory and related problems
Cache miss equations: a compiler framework for analyzing and tuning memory behavior
ACM Transactions on Programming Languages and Systems (TOPLAS)
Quantifying loop nest locality using SPEC'95 and the perfect benchmarks
ACM Transactions on Computer Systems (TOCS)
IEEE Transactions on Computers
Cache performance for multimedia applications
ICS '01 Proceedings of the 15th international conference on Supercomputing
Characterizing the d-TLB behavior of SPEC CPU2000 benchmarks
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Retargetable cache simulation using high level processor models
ACSAC '01 Proceedings of the 6th Australasian conference on Computer systems architecture
Embedded Computing: New Directions in Architecture and Automation
HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
Using the Compiler to Improve Cache Replacement Decisions
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Efficient Microprocessor Design Space Exploration through Statistical Simulation
ANSS '03 Proceedings of the 36th annual symposium on Simulation
Highly accurate and efficient evaluation of randomising set index functions
Journal of Systems Architecture: the EUROMICRO Journal
Optimal Replacement Is NP-Hardfor Nonstandard Caches
IEEE Transactions on Computers
Dynamic techniques to reduce memory traffic in embedded systems
Proceedings of the 1st conference on Computing frontiers
Approximating the optimal replacement algorithm
Proceedings of the 1st conference on Computing frontiers
Control Flow Modeling in Statistical Simulation for Accurate and Efficient Processor Design Studies
Proceedings of the 31st annual international symposium on Computer architecture
Journal of Systems and Software - Special issue: Performance modeling and analysis of computer systems and networks
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Reuse-distance-based miss-rate prediction on a per instruction basis
MSP '04 Proceedings of the 2004 workshop on Memory system performance
Instruction Based Memory Distance Analysis and its Application
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Optimal sample length for efficient cache simulation
Journal of Systems Architecture: the EUROMICRO Journal
Optimizing instruction cache performance of embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Efficient design space exploration of high performance embedded out-of-order processors
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Accurate memory data flow modeling in statistical simulation
Proceedings of the 20th annual international conference on Supercomputing
Feedback-directed memory disambiguation through store distance analysis
Proceedings of the 20th annual international conference on Supercomputing
Predicting locality phases for dynamic memory optimization
Journal of Parallel and Distributed Computing
CRAMM: virtual memory support for garbage-collected applications
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
IEEE Transactions on Computers
Adaptive set pinning: managing shared caches in chip multiprocessors
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Dynamic memory balancing for virtual machines
Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Dynamic memory balancing for virtual machines
ACM SIGOPS Operating Systems Review
Profiling-based hardware/software co-exploration for the design of video coding architectures
IEEE Transactions on Circuits and Systems for Video Technology
Algorithms for memory hierarchies: advanced lectures
Algorithms for memory hierarchies: advanced lectures
Instruction-based reuse-distance prediction for effective cache management
SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
Fast modeling of shared caches in multicore systems
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
On the theory and potential of LRU-MRU collaborative cache management
Proceedings of the international symposium on Memory management
Efficient stack distance computation for priority replacement policies
Proceedings of the 8th ACM International Conference on Computing Frontiers
Enhancing last-level cache performance by block bypassing and early miss determination
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Path-Based reuse distance analysis
CC'06 Proceedings of the 15th international conference on Compiler Construction
A generalized theory of collaborative caching
Proceedings of the 2012 international symposium on Memory Management
Pacman: program-assisted cache management
Proceedings of the 2013 international symposium on memory management
Optimal eviction policies for stochastic address traces
Theoretical Computer Science
Hi-index | 0.01 |
Cache miss characterization models such as the three Cs model are useful in developing schemes to reduce cache misses and their penalty. In this paper we propose the OPT model that uses cache simulation under optimal (OPT) replacement to obtain a finer and more accurate characterization of misses than the three Cs model. However, current methods for optimal cache simulation are slow and difficult to use. We present three new techniques for optimal cache simulation. First, we propose a limited lookahead strategy with error fixing, which allows one pass simulation of multiple optimal caches. Second, we propose a scheme to group entries in the OPT stack, which allows efficient tree based fully-associative cache simulation under OPT. Third, we propose a scheme for exploiting partial inclusion in set-associative cache simulation under OPT. Simulators based on these algorithms were used to obtain cache miss characterizations using the OPT model for nine SPEC benchmarks. The results indicate that miss ratios under OPT are substantially lower than those under LRU replacement, by up to 70% in fully-associative caches, and up to 32% in two-way set-associative caches.