Combining loop transformations considering caches and scheduling
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Performance improvement through overhead analysis: a case study in molecular dynamics
ICS '97 Proceedings of the 11th international conference on Supercomputing
Uniprocessor performance enhancement with additive Schwarz preconditioners on Origin 2000
Advances in Engineering Software - Special issue; special issue on large-scale analysis and design on high-performance computers and workstations
High-level adaptive program optimization with ADAPT
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
A theory and architecture for automating performance diagnosis
Future Generation Computer Systems - I. High Performance Numerical Methods and Applications. II. Performance Data Mining: Automated Diagnosis, Adaption, and Optimization
Asserting performance expectations
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
The Effect of Compiler Optimizations on Pentium 4 Power Consumption
INTERACT '03 Proceedings of the Seventh Workshop on Interaction between Compilers and Computer Architectures
Architectural and Compiler Strategies for Dynamic Power Management in the COPPER Project
IWIA '01 Proceedings of the Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'01)
An Algebra for Cross-Experiment Performance Analysis
ICPP '04 Proceedings of the 2004 International Conference on Parallel Processing
Runtime Empirical Selection of Loop Schedulers on Hyperthreaded SMPs
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Parallel Computing for Bioinformatics and Computational Biology (Wiley Series on Parallel and Distributed Computing)
Power reduction techniques for microprocessor systems
ACM Computing Surveys (CSUR)
Hardware profile-guided automatic page placement for ccNUMA systems
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
The Tau Parallel Performance System
International Journal of High Performance Computing Applications
High-level power analysis for multi-core chips
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Knowledge engineering for automatic parallel performance diagnosis: Research Articles
Concurrency and Computation: Practice & Experience - European–American Working Group on Automatic Performance Analysis (APART)
A component infrastructure for performance and power modeling of parallel scientific applications
Proceedings of the 2008 compFrame/HPC-GECO workshop on Component based high performance
A runtime optimization system for OpenMP
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Scalable parallel trace-based performance analysis
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
A component infrastructure for performance and power modeling of parallel scientific applications
Proceedings of the 2008 compFrame/HPC-GECO workshop on Component based high performance
A query language and runtime tool for evaluating behavior of multi-tier servers
Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Automatic performance debugging of SPMD-style parallel programs
Journal of Parallel and Distributed Computing
An open-source compiler and runtime implementation for Coarray Fortran
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
A dynamic optimization framework for OpenMP
IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
A generic and non-intrusive profiling methodology for systemc multi-core platform simulation models
ARCS'12 Proceedings of the 25th international conference on Architecture of Computing Systems
Framework for a productive performance optimization
Parallel Computing
Hi-index | 0.00 |
Automating the process of parallel performance experimentation, analysis, and problem diagnosis can enhance environments for performance-directed application development, compilation, and execution. This is especially true when parametric studies, modeling, and optimization strategies require large amounts of data to be collected and processed for knowledge synthesis and reuse. This paper describes the integration of the PerfExplorer performance data mining framework with the OpenUH compiler infrastructure. OpenUH provides auto-instrumentation of source code for performance experimentation and PerfExplorer provides automated and reusable analysis of the performance data through a scripting interface. More importantly, PerfExplorer inference rules have been developed to recognize and diagnose performance characteristics important for optimization strategies and modeling. Three case studies are presented which show our success with automation in OpenMP and MPI code tuning, parametric characterization, and power modeling. The paper discusses how the integration supports performance knowledge engineering across applications and feedback-based compiler optimization in general.