A knowledge discovery methodology for the performance evaluation of scientific software
Neural, Parallel & Scientific Computations
A methodology for scientific benchmarking with large-scale applications
Performance evaluation and benchmarking with realistic applications
An efficient static analysis algorithm to detect redundant memory operations
Proceedings of the 2002 workshop on Memory system performance
Gprof: A call graph execution profiler
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Parallel Programmer Productivity: A Case Study of Novice Parallel Programmers
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
PerfExplorer: A Performance Data Mining Framework For Large-Scale Parallel Computing
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
The Tau Parallel Performance System
International Journal of High Performance Computing Applications
Identifying potential parallelism via loop-centric profiling
Proceedings of the 4th international conference on Computing frontiers
Effective performance measurement and analysis of multithreaded applications
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Simsys: a performance simulation framework
Proceedings of the 2013 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools
Hi-index | 0.00 |
Current hardware trends place increasing pressure on programmers and tools to optimize scientific code. Numerous tools and techniques exist, but no single tool is a panacea; instead, different tools have different strengths. Therefore, an assortment of performance tuning utilities and strategies are necessary to best utilize scarce resources (e.g., bandwidth, functional units, cache). This paper describes a combined methodology for the optimization process. The strategy combines static assembly analysis using MAQAO with dynamic information from hardware performance monitoring (HPM) and memory traces. We introduce a new technique, decremental analysis (DECAN), to iteratively identify the individual instructions responsible for performance bottlenecks. We present case studies on applications from several independent software vendors (ISVs) on a SMP Xeon Core 2 platform. These strategies help discover problems related to memory access locality and loop unrolling that lead to a sequential performance improvement of a factor of 2.