Analyzing and visualizing performance of memory hierarchies
Parallel computer systems
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Performance debugging shared memory multiprocessor programs with MTOOL
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
MemSpy: analyzing memory system bottlenecks in programs
SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
A novel cache design for vector processing
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Tile size selection using cache organization and data layout
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
A quantitative analysis of loop nest locality
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Supercomputer performance evaluation and the Perfect Benchmarks
ICS '90 Proceedings of the 4th international conference on Supercomputing
Using GODIVA for data flow analysis
SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Memory characteristics of iterative methods
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
A fast and accurate framework to analyze and optimize cache memory behavior
ACM Transactions on Programming Languages and Systems (TOPLAS)
Intermediately executed code is the key to find refactorings that improve temporal data locality
Proceedings of the 3rd conference on Computing frontiers
Decomposing memory performance: data structures and phases
Proceedings of the 5th international symposium on Memory management
Comprehensive cache performance tuning with a toolset
Future Generation Computer Systems
YACO: a user conducted visualization tool for supporting cache optimization
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
RDVIS: a tool that visualizes the causes of low locality and hints program optimizations
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
Optimization-Oriented visualization of cache access behavior
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
Interactive visualization for memory reference traces
EuroVis'08 Proceedings of the 10th Joint Eurographics / IEEE - VGTC conference on Visualization
An early memory hierarchy evaluation simulator for multimedia applications
Microprocessors & Microsystems
Hi-index | 4.10 |
Cache performance strongly influences the overall performance of software. As a result, researchers continue to use cache simulators to analyze cache performance and optimization. Most cache simulators, however, provide only raw, global information. To improve cache performance, developers must better understand, for example, the impact of software optimizations and the behavior of new hardware cache designs. Cache behavior analysis is a two-step process: First, code sections with poor cache performance must be identified. Second, the causes for poor performance in these code sections must be understood. Cache profilers handle the first task. The authors' Cache Visualization Tool addresses the second task. It thus complements cache profilers. The tool both dynamically visualizes cache content and provides related statistics. A graphical X Windows tool, CVT has a main window that displays a grid representing the cache content. A cache is composed of cache blocks or cache lines (a set of words with consecutive addresses), and each box in the grid corresponds to a cache line. CVT is dedicated to visualizing cache behavior of selected code sections rather than identifying critical code sections. The authors therefore intend to plug the CVT into a profiler similar to CProf that would address the first phase. Furthermore, by collecting information during the profiling run, such as loop boundaries and array subscripts' coefficients, they intend to reduce the number of references that need to be traced.