Identifying potential parallelism via loop-centric profiling

Authors:
Tipp Moseley;Daniel A. Connors;Dirk Grunwald;Ramesh Peri
Affiliations:
University of Colorado at Boulder, Boulder, CO;University of Colorado at Boulder, Boulder, CO;University of Colorado at Boulder, Boulder, CO;Intel Corporation, Austin, TX
Venue:
Proceedings of the 4th international conference on Computing frontiers
Year:
2007

Citing 7
Cited 9

Continuous profiling: where have all the cycles gone?

ACM Transactions on Computer Systems (TOCS)
ProfileMe: hardware support for instruction-level profiling on out-of-order processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Gprof: A call graph execution profiler

SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Control Speculation in Multithreaded Processors through Dynamic Loop Detection

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Analyis of Path Profiling Information Generated with Performance Monitoring Hardware

INTERACT '05 Proceedings of the 9th Annual Workshop on Interaction between Compilers and Computer Architectures
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Runtime predictability of loops

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop

Effective performance measurement and analysis of multithreaded applications

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Binary analysis for measurement and attribution of program performance

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
On-the-fly detection of precise loop nests across procedures on a dynamic binary translation system

Proceedings of the 8th ACM International Conference on Computing Frontiers
Hardware performance monitoring for the rest of us: a position and survey

NPC'11 Proceedings of the 8th IFIP international conference on Network and parallel computing
A balanced approach to application performance tuning

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
ISAMAP: instruction mapping driven by dynamic binary translation

ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Harmony: collection and analysis of parallel block vectors

Proceedings of the 39th Annual International Symposium on Computer Architecture
Loop acceleration exploration for ASIP architecture

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Recovering memory access patterns of executable programs

Science of Computer Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

The transition to multithreaded, multi-core designs places a greater responsibility on programmers and software for improving performance; thread-level parallelism (TLP) will be increasingly relied upon in addition to instruction-level parallelism (ILP) and increased clock frequency. Deciding where to try to parallelize code is difficult, especially for large, complex applications or those where the original developers have moved on. Outer loops are relatively easy targets for parallelization, but traditional profilers focus primarily on functions and hot inner loops. To aid in programmers' parallelization efforts, we introduce the concept of loop-centric profiling to provide a hierarchical view of how much time is spent in a loop and the loops nested within it.This paper introduces two techniques for loop profiling. First, we describe an instrumentation-based approach that gathers highly detailed and accurate information about loop behavior. Second, we present a sampling approach that achieves similar results with negligible overhead. The paper concludes with a case study evaluating the tool on several SPEC 2000 benchmarks.