Pinpointing data locality problems using data-centric analysis

Authors:
Xu Liu;John Mellor-Crummey
Affiliations:
Dept. of Computer Science MS 132, Rice University, P.O. Box 1892, Houston, TX 77251-1892;Dept. of Computer Science MS 132, Rice University, P.O. Box 1892, Houston, TX 77251-1892
Venue:
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Year:
2011

Citing 21
Cited 7

Self-adjusting binary search trees

Journal of the ACM (JACM)
MemSpy: analyzing memory system bottlenecks in programs

SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Mapping performance data for high-level and data views of parallel program performance

ICS '96 Proceedings of the 10th international conference on Supercomputing
Exploiting hardware performance counters with flow and context sensitive profiling

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Continuous profiling: where have all the cycles gone?

ACM Transactions on Computer Systems (TOCS)
Efficient management of parallelism in object-oriented numerical software libraries

Modern software tools for scientific computing
ProfileMe: hardware support for instruction-level profiling on out-of-order processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Cache Profiling and the SPEC Benchmarks: A Case Study

Computer
The Paradyn Parallel Performance Measurement Tool

Computer
Pentium 4 Performance-Monitoring Features

IEEE Micro
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Data Centric Cache Measurement on the Intel ltanium 2 Processor

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Memory Profiling using Hardware Counters

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Fast data-locality profiling of native execution

SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Low-overhead call path profiling of unmodified, optimized code

Proceedings of the 19th annual international conference on Supercomputing
Refactoring for Data Locality

Computer
Binary analysis for measurement and attribution of program performance

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Assigning Blame: Mapping Performance to High Level Parallel Programming Abstractions

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Evaluation techniques for storage hierarchies

IBM Systems Journal
HPCTOOLKIT: tools for performance analysis of optimized parallel programs http://hpctoolkit.org

Concurrency and Computation: Practice & Experience - Scalable Tools for High-End Computing
Discovery of locality-improving refactorings by reuse path analysis

HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications

Cache Conscious Task Regrouping on Multicore Processors

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Enhancing performance optimization of multicore chips and multichip nodes with data structure metrics

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Software techniques for negating skid and approximating cache miss measurements

Parallel Computing
A coldness metric for cache optimization

Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
A data-centric profiler for parallel programs

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A tool to analyze the performance of multithreaded programs on NUMA architectures

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
SPM-Sieve: a framework for assisting data partitioning in scratch pad memory based systems

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In modern computer architectures, access latency varies considerably between different levels in the memory hierarchy. Consequently, applications with data access patterns that don't reuse much data in fast levels of the hierarchy incur additional delays. To improve the performance of complex, data-intensive applications, developers need tools that help them understand the causes of poor memory hierarchy utilization. While most performance tools associate metrics with functions or statements, in this paper we explore data-centric analyses that associate metrics not only with data accesses but also with data objects themselves. Our contributions are three-fold. First, we propose several refinements to existing data-centric techniques that enable accurate and low-overhead measurements. Second, we combine data-centric analysis with call path profiling; this combination of techniques relates inefficient access patterns back to data objects across complete dynamic call chains. Third, we developed a graphical user interface that gracefully presents our analysis results using a multiplicity of views, which helps users identify problematic accesses and data structures. We demonstrate the utility of our approach by showing how our tool identifies problematic data access patterns in several HPC applications and a pair of the SPEC CPU2006 benchmarks.