Evaluating Associativity in CPU Caches
IEEE Transactions on Computers
Cache performance of the integer SPEC benchmarks on a RISC
ACM SIGARCH Computer Architecture News
Analyzing and visualizing performance of memory hierarchies
Parallel computer systems
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Performance debugging shared memory multiprocessor programs with MTOOL
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
MemSpy: analyzing memory system bottlenecks in programs
SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Page placement algorithms for large real-indexed caches
ACM Transactions on Computer Systems (TOCS)
ACM Computing Surveys (CSUR)
Software methods for improvement of cache performance on supercomputer applications
Software methods for improvement of cache performance on supercomputer applications
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Design of cache memories for multi-threaded dataflow architecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Cache miss heuristics and preloading techniques for general-purpose programs
Proceedings of the 28th annual international symposium on Microarchitecture
Informing memory operations: providing memory performance feedback in modern processors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Integrating performance monitoring and communication in parallel computers
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
The influence of caches on the performance of heaps
Journal of Experimental Algorithmics (JEA)
Mapping performance data for high-level and data views of parallel program performance
ICS '96 Proceedings of the 10th international conference on Supercomputing
Examination of a memory access classification scheme for pointer-intensive and numeric programs
ICS '96 Proceedings of the 10th international conference on Supercomputing
Active memory: a new abstraction for memory system simulation
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Fusion of Loops for Parallelism and Locality
IEEE Transactions on Parallel and Distributed Systems
Exploiting hardware performance counters with flow and context sensitive profiling
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Cache miss equations: an analytical representation of cache misses
ICS '97 Proceedings of the 11th international conference on Supercomputing
Shared-memory performance profiling
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Static timing analysis of embedded software
DAC '97 Proceedings of the 34th annual Design Automation Conference
Efficient instruction cache simulation and execution profiling with a threaded-code interpreter
Proceedings of the 29th conference on Winter simulation
Modeling set associative caches behavior for irregular computations
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Eliminating conflict misses for high performance architectures
ICS '98 Proceedings of the 12th international conference on Supercomputing
Informing memory operations: memory performance feedback mechanisms and their applications
ACM Transactions on Computer Systems (TOCS)
Using GODIVA for data flow analysis
SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Searching for the sorting record: experiences in tuning NOW-Sort
SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Cache-conscious data placement
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Precise miss analysis for program transformations with caches of arbitrary associativity
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Augmenting Loop Tiling with Data Alignment for Improved Cache Performance
IEEE Transactions on Computers - Special issue on cache memory and related problems
Cache optimization in scientific computations
Proceedings of the 1999 ACM symposium on Applied computing
Cache conscious programming in undergraduate computer science
SIGCSE '99 The proceedings of the thirtieth SIGCSE technical symposium on Computer science education
The influence of caches on the performance of sorting
SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Selective cache ways: on-demand cache resource allocation
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Cache miss equations: a compiler framework for analyzing and tuning memory behavior
ACM Transactions on Programming Languages and Systems (TOPLAS)
Automated cache optimizations using CME driven diagnosis
Proceedings of the 14th international conference on Supercomputing
Symbolic Cache Analysis for Real-Time Systems
Real-Time Systems - Special issue on worst-case execution-time analysis
Performance analysis using the MIPS R10000 performance counters
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Source-to-Source Instrumentation for the Optimization of an Automatic Reading System
The Journal of Supercomputing
Tools for application-oriented performance tuning
ICS '01 Proceedings of the 15th international conference on Supercomputing
Exact analysis of the cache behavior of nested loops
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Tuning Strassen's matrix multiplication for memory efficiency
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
More enhancements of the simplescalar tool set
ACM SIGARCH Computer Architecture News
Reducing Cache Conflicts by Multi-Level Cache Partitioning and Array Elements Mapping
The Journal of Supercomputing
Towards a theory of cache-efficient algorithms
Journal of the ACM (JACM)
Computer
Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance
IEEE Transactions on Computers
Cache-Efficient Multigrid Algorithms
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
A Programmable Memory Hierarchy for Prefetching Linked Data Structures
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
SIP: Performance Tuning through Source Code Interdependence
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Annotated Memory References: A Mechanism for Informed Cache Management
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Set Associative Cache Behavior Optimization
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Cache Conscious Algorithms for Relational Query Processing
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Compiler managed micro-cache bypassing for high performance EPIC processors
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
METRIC: tracking down inefficiencies in the memory hierarchy via binary rewriting
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Generating Dynamic Program Analysis Tools
ASWEC '97 Proceedings of the Australian Software Engineering Conference
Cache Performance and Algorithm Optimization
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
A Performance Debugger for Eliminating Excess Synchronization in Shared-Memory Parallel Programs
MASCOTS '96 Proceedings of the 4th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Caching-Efficient Multithreaded Fast Multiplication of Sparse Matrices
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
A fast and accurate framework to analyze and optimize cache memory behavior
ACM Transactions on Programming Languages and Systems (TOPLAS)
A compiler tool to predict memory hierarchy performance of scientific codes
Parallel Computing
Detailed cache coherence characterization for OpenMP benchmarks
Proceedings of the 18th annual international conference on Supercomputing
AccMon: Automatically Detecting Memory-Related Bugs via Program Counter-Based Invariants
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Tolerating memory latency through push prefetching for pointer-intensive applications
ACM Transactions on Architecture and Code Optimization (TACO)
Data Centric Cache Measurement on the Intel ltanium 2 Processor
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Fast data-locality profiling of native execution
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Cache-Efficient Multigrid Algorithms
International Journal of High Performance Computing Applications
A hybrid hardware/software approach to efficiently determine cache coherence Bottlenecks
Proceedings of the 19th annual international conference on Supercomputing
Intermediately executed code is the key to find refactorings that improve temporal data locality
Proceedings of the 3rd conference on Computing frontiers
Decomposing memory performance: data structures and phases
Proceedings of the 5th international symposium on Memory management
Analysis of cache-coherence bottlenecks with hybrid hardware/software techniques
ACM Transactions on Architecture and Code Optimization (TACO)
METRIC: Memory tracing via dynamic binary rewriting to identify cache inefficiencies
ACM Transactions on Programming Languages and Systems (TOPLAS)
Source-Code-Correlated Cache Coherence Characterization of OpenMP Benchmarks
IEEE Transactions on Parallel and Distributed Systems
Effective support of simulation in computer architecture instruction
WCAE '02 Proceedings of the 2002 workshop on Computer architecture education: Held in conjunction with the 29th International Symposium on Computer Architecture
Fast indexing for blocked array layouts to reduce cache misses
International Journal of High Performance Computing and Networking
Algorithms for memory hierarchies: advanced lectures
Algorithms for memory hierarchies: advanced lectures
Locating cache performance bottlenecks using data profiling
Proceedings of the 5th European conference on Computer systems
A compiler framework for the reduction of worst-case execution times
Real-Time Systems
Cache index-aware memory allocation
Proceedings of the international symposium on Memory management
RDVIS: a tool that visualizes the causes of low locality and hints program optimizations
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
Pinpointing data locality problems using data-centric analysis
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Phase guided profiling for fast cache modeling
Proceedings of the Tenth International Symposium on Code Generation and Optimization
THeME: a system for testing by hardware monitoring events
Proceedings of the 2012 International Symposium on Software Testing and Analysis
A survey on cache tuning from a power/energy perspective
ACM Computing Surveys (CSUR)
A data-centric profiler for parallel programs
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 4.11 |
A vital tool-box component, the CProf cache profiling system lets programmers identify hot spots by providing cache performance information at the source-line and data-structure level. Our purpose is to introduce a broad audience to cache performance profiling and tuning techniques. Although used sporadically in the supercomputer and multiprocessor communities, these techniques also have broad applicability to programs running on fast uniprocessor workstations. We show that cache profiling, using our CProf cache profiling system, improves program performance by focusing a programmer's attention on problematic code sections and providing insight into appropriate program transformations.