Evaluating Associativity in CPU Caches
IEEE Transactions on Computers
Tradeoffs in supporting two page sizes
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Surpassing the TLB performance of superpages with less operating system support
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Set-associative cache simulation using generalized binomial trees
ACM Transactions on Computer Systems (TOCS)
Reducing TLB and memory overhead using online superpage promotion
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Does “just in time” = “better late than never”?
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Calculating stack distances efficiently
Proceedings of the 2002 workshop on Memory system performance
Continuous program optimization: A case study
ACM Transactions on Programming Languages and Systems (TOPLAS)
Predicting whole-program locality through reuse distance analysis
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Continuous Compilation: A New Approach to Aggressive and Adaptive Code Transformation
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Practical, transparent operating system support for superpages
ACM SIGOPS Operating Systems Review - OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation
Efficient, Unified, and Scalable Performance Monitoring for Multiprocessor Operating Systems
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Proceedings of the 32nd annual international symposium on Computer Architecture
Evaluation techniques for storage hierarchies
IBM Systems Journal
IBM Journal of Research and Development
X10: an object-oriented approach to non-uniform cluster computing
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
SmartApps: middle-ware for adaptive applications on reconfigurable platforms
ACM SIGOPS Operating Systems Review
Insights into providing dynamic adaptation of operating system policies
ACM SIGOPS Operating Systems Review
Improving locality with parallel hierarchical copying GC
Proceedings of the 5th international symposium on Memory management
Scalable locality-conscious multithreaded memory allocation
Proceedings of the 5th international symposium on Memory management
Performance and environment monitoring for continuous program optimization
IBM Journal of Research and Development
Locality approximation using time
Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Optimizing communication overlap for high-speed networks
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Predicting locality phases for dynamic memory optimization
Journal of Parallel and Distributed Computing
K42: lessons for the OS community
ACM SIGOPS Operating Systems Review
A component model of spatial locality
Proceedings of the 2009 international symposium on Memory management
Tuning parallel applications in parallel
Parallel Computing
All-window profiling and composable models of cache sharing
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Performance characteristics of explicit superpage support
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Cache Conscious Task Regrouping on Multicore Processors
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Exploiting semantics of virtual memory to improve the efficiency of the on-chip memory system
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
HOTL: a higher order theory of locality
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
With the growing awareness that individual hardware cores will not continue to produce the same level of performance improvement, there is a need to develop an integrated approach to performance optimization. In this paper we present a paradigm for Continuous Program Optimization (CPO), whereby automatic agents monitor and optimize application and system performance. The monitoring data is used to analyze and create models of application and system behavior. Using this analysis, we describe how CPO agents can improve the performance of both the application and the underlying system. Using the CPO paradigm, we implemented cooperating page size optimization agents that automatically optimize large page usage. An of fine agent uses vertically integrated performance data to produce a page size benefitanalysis for different categories of data structures within an application. We show how an online CPO agent can use the results of the predictive analysis to automatically improve application performance. We validate that the predictions made by the CPO agent reflectthe actual performance gains of up to 60% across a range of scientific applications including the SPECcpu2000 floating point benchmarks and two large high performance computing (HPC) applications.