The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Optimal weighted loop fusion for parallel programs
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Data transformations for eliminating conflict misses
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Precise miss analysis for program transformations with caches of arbitrary associativity
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Automated cache optimizations using CME driven diagnosis
Proceedings of the 14th international conference on Supercomputing
The Augmint multiprocessor simulation toolkit for Intel x86 architectures
ICCD '96 Proceedings of the 1996 International Conference on Computer Design, VLSI in Computers and Processors
Optimizing Graph Algorithms for Improved Cache Performance
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness
CASCON '94 Proceedings of the 1994 conference of the Centre for Advanced Studies on Collaborative research
Identifying and Exploiting Spatial Regularity in Data Memory References
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
An inexact-suffix-tree-based algorithm for detecting extensible patterns
Theoretical Computer Science - Pattern discovery in the post genome
An Interactive Graphical Environment for Code Optimization
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part II
Guided Prefetching Based on Runtime Access Patterns
ICCS '08 Proceedings of the 8th international conference on Computational Science, Part III
Supporting cache locality optimization with a toolset
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Hi-index | 0.00 |
Cache optimization becomes increasingly important for achieving high computing performance, especially on current and future chip-multiprocessor (CMP) systems, which usually show a rather higher cache miss ratio than uni-processors. For such optimization, information about the access locality is needed in order to help the user in the tasks of data allocation, data transformation, and code transformation which are often used to enhance the utilization of cached data towards a better cache hit rate. In this paper we demonstrate an analysis tool capable of detecting the spatial and temporal relationship between memory accesses and providing information, such as access pattern and access stride, which is required for applying some optimization techniques like address grouping, software prefetching, and code transformation. Based on the memory access trace generated by a code instrumentor, the analysis tool uses appropriate algorithms to detect repeated address sequences and the constant distance between accesses to the different elements of a data structure. This allows the users to pack data with spatial locality in the same cache block so that needed data can be loaded into the cache at the same time. In addition, the analysis tool computes the push back distance which shows how a cache miss can be avoided by reusing the data before replacement. This helps to reduce cache misses increasing therefore the temporal reusability of the working set.