ATUM: a new technique for capturing address traces using microcode
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Line (block) size choice for CPU cache memories
IEEE Transactions on Computers
Cache performance of operating system and multiprogramming workloads
ACM Transactions on Computer Systems (TOCS)
On the inclusion properties for multi-level cache hierarchies
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Performance tradeoffs in cache design
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A Case for Direct-Mapped Caches
Computer
ACM Transactions on Computer Systems (TOCS)
Cache evaluation and the impact of workload choice
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
ACM Computing Surveys (CSUR)
Cache Performance in the VAX-11/780
ACM Transactions on Computer Systems (TOCS)
Cold-start vs. warm-start miss ratios
Communications of the ACM
Cache memory performance in a unix enviroment
ACM SIGARCH Computer Architecture News
Bibliography and reading on CPU cache memories and related topics
ACM SIGARCH Computer Architecture News
Cache memories for PDP-11 family computers
ISCA '76 Proceedings of the 3rd annual symposium on Computer architecture
The Memory Architecture and the Cache and Memory Management Unit for
The Memory Architecture and the Cache and Memory Management Unit for
Efficient Analysis of Caching Systems
Efficient Analysis of Caching Systems
Aspects of Cache Memory and Instruction
Aspects of Cache Memory and Instruction
Analysis of cache replacement-algorithms
Analysis of cache replacement-algorithms
Cache performance of the integer SPEC benchmarks on a RISC
ACM SIGARCH Computer Architecture News
Efficient trace-driven simulation method for cache performance analysis
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Blocking: exploiting spatial locality for trace compaction
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The effect of context switches on cache performance
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Efficient trace-driven simulation methods for cache performance analysis
ACM Transactions on Computer Systems (TOCS)
Fast instruction cache performance evaluation using compile-time analysis
SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Synthetic Traces for Trace-Driven Simulation of Cache Memories
IEEE Transactions on Computers
Page placement algorithms for large real-indexed caches
ACM Transactions on Computer Systems (TOCS)
Tradeoffs in supporting two page sizes
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
A Model of Workloads and its Use in Miss-Rate Prediction for Fully Associative Caches
IEEE Transactions on Computers
A case for two-way skewed-associative caches
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Column-associative caches: a technique for reducing the miss rate of direct-mapped caches
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Wisconsin Architectural Research Tool Set
ACM SIGARCH Computer Architecture News
Efficient simulation of caches under optimal replacement with applications to miss characterization
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Cache inclusion and processor sampling in multiprocessor simulations
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Memory subsystem performance of programs using copying garbage collection
POPL '94 Proceedings of the 21st ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Compiling for shared-memory and message-passing computers
ACM Letters on Programming Languages and Systems (LOPLAS)
The effectiveness of caches for vector processors
ICS '94 Proceedings of the 8th international conference on Supercomputing
Guarded execution and branch prediction in dynamic ILP processors
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tempest and typhoon: user-level shared memory
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Decoupled sectored caches: conciliating low tag implementation cost
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Set-associative cache simulation using generalized binomial trees
ACM Transactions on Computer Systems (TOCS)
Memory system performance of programs with intensive heap allocation
ACM Transactions on Computer Systems (TOCS)
A trace-driven simulation methodology
ACM SIGARCH Computer Architecture News
Active memory: a new abstraction for memory-system simulation
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Direct-mapped versus set-associative pipelined caches
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Partial resolution in branch target buffers
Proceedings of the 28th annual international symposium on Microarchitecture
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
An educational tool for testing hierarchical multilevel caches
ACM SIGARCH Computer Architecture News
Thread scheduling for cache locality
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
A quantitative analysis of loop nest locality
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Active memory: a new abstraction for memory system simulation
ACM Transactions on Modeling and Computer Simulation (TOMACS)
IEEE Transactions on Computers
Trace-driven memory simulation: a survey
ACM Computing Surveys (CSUR)
The selection of optimal cache lines for microprocessor-based controllers
MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
Control flow prediction for dynamic ILP processors
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Cache behavior of network protocols
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A compiler algorithm for optimizing locality in loop nests
ICS '97 Proceedings of the 11th international conference on Supercomputing
Cache miss equations: an analytical representation of cache misses
ICS '97 Proceedings of the 11th international conference on Supercomputing
Partial Resolution in Branch Target Buffers
IEEE Transactions on Computers
An empirical study of the effects of careful page placement in Linux
ACM-SE 36 Proceedings of the 36th annual Southeast regional conference
Combining Trace Sampling with Single Pass Methods for Efficient Cache Simulation
IEEE Transactions on Computers
Tempest and typhoon: user-level shared memory
25 years of the international symposia on Computer architecture (selected papers)
Improving locality using loop and data transformations in an integrated framework
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Cache-conscious data placement
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Functional Implementation Techniques for CPU Cache Memories
IEEE Transactions on Computers - Special issue on cache memory and related problems
Improving Cache Locality by a Combination of Loop and Data Transformations
IEEE Transactions on Computers - Special issue on cache memory and related problems
Analysis of Temporal-Based Program Behavior for Improved Instruction Cache Performance
IEEE Transactions on Computers - Special issue on cache memory and related problems
A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts
IEEE Transactions on Parallel and Distributed Systems
Cache conscious programming in undergraduate computer science
SIGCSE '99 The proceedings of the thirtieth SIGCSE technical symposium on Computer science education
Iterative cache simulation of embedded CPUs with trace stripping
CODES '99 Proceedings of the seventh international workshop on Hardware/software codesign
Cache-conscious structure layout
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
The pool of subsectors cache design
ICS '99 Proceedings of the 13th international conference on Supercomputing
Optimal replacements in caches with two miss costs
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Recursive array layouts and fast parallel matrix multiplication
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Irreproducible benchmarks might be sometimes helpful
Proceedings of the 8th ACM SIGOPS European workshop on Support for composing distributed applications
Generation and analysis of very long address traces
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Cache miss equations: a compiler framework for analyzing and tuning memory behavior
ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallel trace-driven cache simulation by time partitioning
WSC' 90 Proceedings of the 22nd conference on Winter simulation
Quantifying loop nest locality using SPEC'95 and the perfect benchmarks
ACM Transactions on Computer Systems (TOCS)
Co-design of interleaved memory systems
CODES '00 Proceedings of the eighth international workshop on Hardware/software codesign
CODES '00 Proceedings of the eighth international workshop on Hardware/software codesign
Cache performance for multimedia applications
ICS '01 Proceedings of the 15th international conference on Supercomputing
Exact analysis of the cache behavior of nested loops
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Tuning Strassen's matrix multiplication for memory efficiency
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Timed compiled-code simulation of embedded software for performance analysis of SOC design
Proceedings of the 39th annual Design Automation Conference
Timekeeping in the memory system: predicting and optimizing memory behavior
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Towards a theory of cache-efficient algorithms
Journal of the ACM (JACM)
Correction to "Evaluating Associativity in CPU Caches"
IEEE Transactions on Computers
Performance Implications of Tolerating Cache Faults
IEEE Transactions on Computers
A Quantitative Evaluation of Cache Types for High-Performance Computer Systems
IEEE Transactions on Computers
A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches
IEEE Transactions on Computers
IEEE Transactions on Computers
Efficient Stack Simulation for Set-Associative Virtual Address Caches With Real Tags
IEEE Transactions on Computers
Massively Parallel Algorithms for Trace-Driven Cache Simulations
IEEE Transactions on Parallel and Distributed Systems
Stack Evaluation of Arbitrary Set-Associative Multiprocessor Caches
IEEE Transactions on Parallel and Distributed Systems
Recursive Array Layouts and Fast Matrix Multiplication
IEEE Transactions on Parallel and Distributed Systems
The set-associative cache performance of search trees
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Cache-Efficient Multigrid Algorithms
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Compile-Time Based Performance Prediction
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
RECET - A Real-Time Cache Evaluation Tool
MASCOTS '95 Proceedings of the 3rd International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Trace-Driven Memory Simulation: A Survey
Performance Evaluation: Origins and Directions
Cache Conscious Algorithms for Relational Query Processing
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Improving cache hit ratio by extended referencing cache lines
Journal of Computing Sciences in Colleges
Calculating stack distances efficiently
Proceedings of the 2002 workshop on Memory system performance
Estimating cache misses and locality using stack distances
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
The impact of extrinsic cache performance on predictability of real-time systems
RTCSA '95 Proceedings of the 2nd International Workshop on Real-Time Computing Systems and Applications
Caches versus object allocation
IWOOOS '96 Proceedings of the 5th International Workshop on Object Orientation in Operating Systems (IWOOOS '96)
Highly accurate and efficient evaluation of randomising set index functions
Journal of Systems Architecture: the EUROMICRO Journal
A Quantitative Analysis of Tile Size Selection Algorithms
The Journal of Supercomputing
High level cache simulation for heterogeneous multiprocessors
Proceedings of the 41st annual Design Automation Conference
PB-LRU: a self-tuning power aware storage cache replacement algorithm for conserving disk energy
Proceedings of the 18th annual international conference on Supercomputing
Design space exploration of caches using compressed traces
Proceedings of the 18th annual international conference on Supercomputing
A proposal for input-sensitivity analysis of profile-driven optimizations on embedded applications
MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
Dynamic tracking of page miss ratio curve for memory management
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Power-Aware Storage Cache Management
IEEE Transactions on Computers
Cache-Conscious Automata for XML Filtering
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
IEEE Transactions on Computers
Generating cache hints for improved program efficiency
Journal of Systems Architecture: the EUROMICRO Journal
A non-uniform cache architecture for low power system design
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Cache-Efficient Multigrid Algorithms
International Journal of High Performance Computing Applications
Reordering for cache conscious photon mapping
GI '05 Proceedings of Graphics Interface 2005
Multiple Page Size Modeling and Optimization
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Optimizing instruction cache performance of embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Finding optimal L1 cache configuration for embedded systems
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
ACM-SE 33 Proceedings of the 33rd annual on Southeast regional conference
A cache-defect-aware code placement algorithm for improving the performance of processors
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Ultra low-cost defect protection for microprocessor pipelines
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Cache-Conscious Automata for XML Filtering
IEEE Transactions on Knowledge and Data Engineering
A memory model for scientific algorithms on graphics processors
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
CMP cache performance projection: accessibility vs. capacity
ACM SIGARCH Computer Architecture News
Automated design of application specific superscalar processors: an analytical approach
Proceedings of the 34th annual international symposium on Computer architecture
A one-shot configurable-cache tuner for improved energy and performance
Proceedings of the conference on Design, automation and test in Europe
Instruction trace compression for rapid instruction cache simulation
Proceedings of the conference on Design, automation and test in Europe
Low-cost protection for SER upsets and silicon defects
Proceedings of the conference on Design, automation and test in Europe
An interactive, visual simulator for the DLX pipeline
WCAE-3 '97 Proceedings of the 1997 workshop on Computer architecture education
Communications of the ACM
Cache-efficient numerical algorithms using graphics hardware
Parallel Computing
Reducing Photon-Mapping Bandwidth by Query Reordering
IEEE Transactions on Visualization and Computer Graphics
Adaptive set pinning: managing shared caches in chip multiprocessors
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
A table-based method for single-pass cache optimization
Proceedings of the 18th ACM Great Lakes symposium on VLSI
AES Encryption Implementation and Analysis on Commodity Graphics Processing Units
CHES '07 Proceedings of the 9th international workshop on Cryptographic Hardware and Embedded Systems
Static analysis for fast and accurate design space exploration of caches
CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
How to Write Fast Numerical Code: A Small Introduction
Generative and Transformational Techniques in Software Engineering II
Measuring the Normality of Web Proxies' Behavior Based on Locality Principles
NPC '08 Proceedings of the IFIP International Conference on Network and Parallel Computing
Roofline: an insightful visual performance model for multicore architectures
Communications of the ACM - A Direct Path to Dependable Software
Exact and fast L1 cache simulation for embedded systems
Proceedings of the 2009 Asia and South Pacific Design Automation Conference
HASS: a scheduler for heterogeneous multicore systems
ACM SIGOPS Operating Systems Review
Program locality analysis using reuse distance
ACM Transactions on Programming Languages and Systems (TOPLAS)
Optimization of a lattice Boltzmann computation on state-of-the-art multicore platforms
Journal of Parallel and Distributed Computing
SuSeSim: a fast simulation strategy to find optimal L1 cache configuration for embedded systems
CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Algorithms for memory hierarchies: advanced lectures
Algorithms for memory hierarchies: advanced lectures
Performance of large low-associativity caches
ACM SIGMETRICS Performance Evaluation Review
Model-guided empirical tuning of loop fusion
International Journal of High Performance Systems Architecture
Proceedings of the 47th Design Automation Conference
DEW: a fast level 1 cache simulation approach for embedded processors with FIFO replacement policy
Proceedings of the Conference on Design, Automation and Test in Europe
Improved procedure placement for set associative caches
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Journal of Parallel and Distributed Computing
Algorithm engineering: bridging the gap between algorithm theory and practice
Algorithm engineering: bridging the gap between algorithm theory and practice
Redesigning the string hash table, burst trie, and BST to exploit cache
Journal of Experimental Algorithmics (JEA)
The ZCache: Decoupling Ways and Associativity
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
All-window profiling and composable models of cache sharing
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
T-SPaCS: a two-level single-pass cache simulation methodology
Proceedings of the 16th Asia and South Pacific Design Automation Conference
Stack distance based worst-case instruction cache performance analysis
Proceedings of the 2011 ACM Symposium on Applied Computing
On the theory and potential of LRU-MRU collaborative cache management
Proceedings of the international symposium on Memory management
Cache index-aware memory allocation
Proceedings of the international symposium on Memory management
HC-Sim: a fast and exact l1 cache simulator with scratchpad memory co-simulation support
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A cache-pinning strategy for improving generational garbage collection
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
A cache-conscious profitability model for empirical tuning of loop fusion
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
CIPARSim: cache intersection property assisted rapid single-pass FIFO cache simulation technique
Proceedings of the International Conference on Computer-Aided Design
Why on-chip cache coherence is here to stay
Communications of the ACM
Reducing OLTP instruction misses with thread migration
DaMoN '12 Proceedings of the Eighth International Workshop on Data Management on New Hardware
Characterizing and improving the use of demand-fetched caches in GPUs
Proceedings of the 26th ACM international conference on Supercomputing
Cache Conscious Task Regrouping on Multicore Processors
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
ASCIB: adaptive selection of cache indexing bits for removing conflict misses
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Optimization of geometric multigrid for emerging multi- and manycore processors
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
SLICC: Self-Assembly of Instruction Cache Collectives for OLTP Workloads
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Reuse-based online models for caches
Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems
A survey on cache tuning from a power/energy perspective
ACM Computing Surveys (CSUR)
OLTP in wonderland: where do cache misses come from in major OLTP components?
Proceedings of the Ninth International Workshop on Data Management on New Hardware
Explicit reservation of cache memory in a predictable, preemptive multitasking real-time system
ACM Transactions on Embedded Computing Systems (TECS)
An analytical approach for fast and accurate design space exploration of instruction caches
ACM Transactions on Embedded Computing Systems (TECS)
Modeling the impact of permanent faults in caches
ACM Transactions on Architecture and Code Optimization (TACO)
MICA: a holistic approach to fast in-memory key-value storage
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Hi-index | 15.05 |
The authors present new and efficient algorithms for simulating alternative direct-mapped and set-associative caches and use them to quantify the effect of limited associativity on the cache miss ratio. They introduce an algorithm, forest simulation, for simulating alternative direct-mapped caches and generalize one, which they call all-associativity simulation, for simulating alternative direct-mapped, set-associative, and fully-associative caches. The authors find that although all-associativity simulation is theoretically less efficient than forest simulation or stack simulation (a commonly used simulation algorithm), in practice it is not much slower and allows the simulation of many more caches with a single pass through an address trace. The authors also provide data and insight into how varying associatively affects the miss ratio.