Implementing Precise Interrupts in Pipelined Processors
IEEE Transactions on Computers
A Case for Direct-Mapped Caches
Computer
Second bibliography on Cache memories
ACM SIGARCH Computer Architecture News
MemSpy: analyzing memory system bottlenecks in programs
SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
A quantitative analysis of loop nest locality
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Predictability of load/store instruction latencies
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
The SimpleScalar tool set, version 2.0
ACM SIGARCH Computer Architecture News
Segregating heap objects by reference behavior and lifetime
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Effective jump-pointer prefetching for linked data structures
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
An efficient profile-analysis framework for data-layout optimizations
POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Dynamic hot data stream prefetching for general-purpose programs
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Measuring Experimental Error in Microprocessor Simulation
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
A stateless, content-directed data prefetching mechanism
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Computer
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Performance '84 Proceedings of the Tenth International Symposium on Computer Performance Modelling, Measurement and Evaluation
Proceedings of the 30th annual international symposium on Computer architecture
The Fuzzy Correlation between Code and Performance Predictability
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Online Phase Detection Algorithms
Proceedings of the International Symposium on Code Generation and Optimization
Structures for phase classification
ISPASS '04 Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software
The Strong correlation Between Code Signatures and Performance
ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
Phase guided profiling for fast cache modeling
Proceedings of the Tenth International Symposium on Code Generation and Optimization
SPM-Sieve: a framework for assisting data partitioning in scratch pad memory based systems
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Hi-index | 0.00 |
The memory hierarchy continues to have a substantial effect on application performance. This paper explores the potential of high-level application understanding in improving the performance of modern memory hierarchies, decomposing the often-chaotic address stream of an application into multiple more regular streams. We present two orthogonal methodologies. The first is a system called DTrack that decomposes the dynamic reference stream of a C program by tagging each reference with its global variable or heap call-site name. The second is a technique to determine the correct granularity at which to study the global phase behavior of applications. Applying these twin analysis methods to twelve CSPEC2000 benchmarks, we demonstrate that they reveal data structure interactions that remain obscured with traditional aggregation-based analysis methods. Such a characterization creates a rich profile of an application's memory behavior that highlights the most memory-intensive data structures and program phases, and we illustrate how this profile can lead system and application designers to a deeper understanding of the applications they study.