Reducing TLB and memory overhead using online superpage promotion
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Does “just in time” = “better late than never”?
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Fine-grained dynamic instrumentation of commodity operating system kernels
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Performance analysis using the MIPS R10000 performance counters
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
An infrastructure for multiprocessor run-time adaptation
WOSS '02 Proceedings of the first workshop on Self-healing systems
Continuous program optimization: A case study
ACM Transactions on Programming Languages and Systems (TOPLAS)
Continuous Compilation: A New Approach to Aggressive and Adaptive Code Transformation
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Practical, transparent operating system support for superpages
ACM SIGOPS Operating Systems Review - OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation
SvPablo: A Multi-Language Architecture-Independent Performance Analysis System
ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing
A brief history of just-in-time
ACM Computing Surveys (CSUR)
Automatically Tuned Linear Algebra Software
Automatically Tuned Linear Algebra Software
Strata-various: multi-layer visualization of dynamics in software system behavior
VIS '94 Proceedings of the conference on Visualization '94
A Dynamically Tuned Sorting Library
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Vertical profiling: understanding the behavior of object-priented applications
OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Efficient, Unified, and Scalable Performance Monitoring for Multiprocessor Operating Systems
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Proceedings of the 32nd annual international symposium on Computer Architecture
Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Algorithms
International Journal of High Performance Computing Applications
Multiple Page Size Modeling and Optimization
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Dynamic instrumentation of production systems
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
A characterization of shared data access patterns in UPC programs
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Proceedings of the 7th international conference on Autonomic computing
PerfExpert: An Easy-to-Use Performance Diagnosis Tool for HPC Applications
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
Metronome: operating system level performance management via self-adaptive computing
Proceedings of the 49th Annual Design Automation Conference
Hi-index | 0.00 |
Our research is aimed at characterizing, understanding, and exploiting the interactions between hardware and software to improve system performance. We have developed a paradigm for continuous program optimization (CPO) that assists in and automates the challenging task of pelformance tuning, and we have implemented an initial prototype of this paradigm. At the core of our implementation is a performance- and environment-monitoring (PEM) component that vertically integrates performance events from various layers in the execution stack. CPO agents use the data provided by PEM to detect, diagnose, and alleviate performance problems on existing systems. In addition, CPO can be used to improve future architecture designs by analyzing PEM data collected on a whole-system simulator while varying architectural characteristics. In this paper, we present the CPO paradigm, describe an initial implementation that includes PEM as a component, and discuss two CPO clients.