Cache-conscious data placement
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Overcoming the challenges to feedback-directed optimization (Keynote Talk)
DYNAMO '00 Proceedings of the ACM SIGPLAN workshop on Dynamic and adaptive compilation and optimization
Efficient representations and abstractions for quantifying and exploiting data reference locality
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
An efficient profile-analysis framework for data-layout optimizations
POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Efficient and precise array access analysis
ACM Transactions on Programming Languages and Systems (TOPLAS)
Dynamic hot data stream prefetching for general-purpose programs
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Using Profile Information to Assist Advaced Compiler Optimization and Scheduling
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Proceedings of the 30th annual international symposium on Computer architecture
Identifying hierarchical structure in sequences: a linear-time algorithm
Journal of Artificial Intelligence Research
Prefetch injection based on hardware monitoring and object metadata
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Low overhead program monitoring and profiling
PASTE '05 Proceedings of the 6th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
Recursive data structure profiling
Proceedings of the 2005 workshop on Memory system performance
Proceedings of the Workshop on Binary Instrumentation and Applications
Directly characterizing cross core interference through contention synthesis
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Speculative separation for privatization and reductions
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Pointy: a hybrid pointer prefetcher for managed runtime systems
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Using memory profile analysis for automatic synthesis of pointers code
ACM Transactions on Embedded Computing Systems (TECS)
Practical automatic loop specialization
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
Memory profiling is the process of characterizing a program's memorybehavior by observing and recording its response to specific inputsets. Relevant aspects of the program's memory behavior maythen be used to guide memory optimizations in an aggressively optimizingcompiler. In general, memory access behavior has eludedmeaningful characterization because of confounding artifacts frommemory allocators, linker data layout, and OS memory management.Since these artifacts may change from run to run, memoryaccess patterns may appear different in each run even for the sameinput set. Worse, regular memory access behavior such as linkedlist traversals appear to have no structure.In this paper we present object-relative translation and decompositiontechniques to eliminate these artifacts and to expose previouslyobscured memory access patterns. To demonstrate the potential ofthese ideas, we implement two different memory profilers targetedat different sets of applications. These profilers outperform the existingones in terms of profile size and useful information per byteof data. The first profiler is a lossless profiler, called WHOMP,which uses object-relativity to achieve a 22% better compressionthan the previously best known scheme. The second profiler, calledLEAP, uses lossy compression to get highly compact profiles whileproviding useful information to the targeted applications. LEAPcorrectly characterizes the memory alias rates for 56% more instructionpairs than the previously best known scheme with a practicalrunning time.