Identifying and Exploiting Spatial Regularity in Data Memory References

Authors:
Tushar Mohan;Bronis R. de Supinski;Sally A. McKee;Frank Mueller;Andy Yoo;Martin Schulz
Affiliations:
Lawrence Berkeley National Laboratory, Berkeley, CA;Center for Applied Scientific Computing, Livermore, CA;Cornell University, Ithaca, NY;North Carolina State University, Raleigh, NC;Center for Applied Scientific Computing, Livermore, CA;Cornell University, Ithaca, NY
Venue:
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Year:
2003

Citing 20
Cited 12

Code generation for streaming: an access/execute mechanism

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Data prefetching in multiprocessor vector cache memories

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
An effective on-chip preloading scheme to reduce data access penalty

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Beyond induction variables: detecting and classifying sequences using a demand-driven SSA form

ACM Transactions on Programming Languages and Systems (TOPLAS)
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Data-centric multi-level blocking

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
Cache-conscious data placement

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Cache-conscious structure layout

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Improving cache performance in dynamic applications through data and computation reorganization at run time

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Locality optimizations for multi-level caches

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Dynamic Access Ordering for Streamed Computations

IEEE Transactions on Computers
Efficient representations and abstractions for quantifying and exploiting data reference locality

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
SIGMA: a simulator infrastructure to guide memory analysis

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A framework for performance modeling and prediction

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
METRIC: tracking down inefficiencies in the memory hierarchy via binary rewriting

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Impulse: Building a Smarter Memory Controller

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
An API for Runtime Code Patching

International Journal of High Performance Computing Applications
Near-optimal padding for removing conflict misses

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing

Owl: next generation system monitoring

Proceedings of the 2nd conference on Computing frontiers
Fast data-locality profiling of native execution

SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A hybrid hardware/software approach to efficiently determine cache coherence Bottlenecks

Proceedings of the 19th annual international conference on Supercomputing
Improving the computational intensity of unstructured mesh applications

Proceedings of the 19th annual international conference on Supercomputing
Analysis of cache-coherence bottlenecks with hybrid hardware/software techniques

ACM Transactions on Architecture and Code Optimization (TACO)
METRIC: Memory tracing via dynamic binary rewriting to identify cache inefficiencies

ACM Transactions on Programming Languages and Systems (TOPLAS)
HMTT: a platform independent full-system memory trace monitoring system

SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Guided Prefetching Based on Runtime Access Patterns

ICCS '08 Proceedings of the 8th international conference on Computational Science, Part III
Comprehensive cache performance tuning with a toolset

Future Generation Computer Systems
Supporting cache locality optimization with a toolset

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Analysis of the spatial and temporal locality in data accesses

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
Optimizing stencil application on multi-thread GPU architecture using stream programming model

ARCS'10 Proceedings of the 23rd international conference on Architecture of Computing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The growing processor/memory performance gap causes the performance of many codes to be limited by memory accesses. If known to exist in an application, strided memory accesses forming streams can be targeted by optimizations such as prefetching, relocation, remapping, and vector loads. Undetected, they can be a significant source of memory stalls in loops. Existing stream-detection mechanisms either require special hardware, which may not gather statistics for subsequent analysis, or are limited to compile-time detection of array accesses in loops. Formally, little treatment has been accorded to the subject; the concept of locality fails to capture the existence of streams in a program's memory accesses. The contributions of this paper are as follows. First, we define spatial regularity as a means to discuss the presence and effects of streams. Second, we develop measures to quantify spatial regularity, and we design and implement an on-line, parallel algorithm to detect streams - and hence regularity - in running applications. Third, we use examples from real codes and common benchmarks to illustrate how derived stream statistics can be used to guide the application of profile-driven optimizations. Overall, we demonstrate the benefits of our novel regularity metric as an instrument to detect potential for code optimizations affecting memory performance.