Criticality stacks: identifying critical threads in parallel programs using synchronization behavior

Authors:
Kristof Du Bois;Stijn Eyerman;Jennifer B. Sartor;Lieven Eeckhout
Affiliations:
Ghent University, Belgium;Ghent University, Belgium;Ghent University, Belgium;Ghent University, Belgium
Venue:
Proceedings of the 40th Annual International Symposium on Computer Architecture
Year:
2013

Citing 32
Cited 3

Transactional memory: architectural support for lock-free data structures

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
An online computation of critical path profiling

SPDT '96 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Focusing processor policies via critical-path prediction

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Speculative lock elision: enabling highly concurrent multithreaded execution

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Transactional lock-free execution of lock-based programs

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Speculative synchronization: applying thread-level speculation to explicitly parallel applications

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Quantifying instruction criticality for shared memory multiprocessors

Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures
Dynamic Prediction of Critical Path Instructions

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Exploiting Barriers to Optimize Power Consumption of CMPs

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Mitigating Amdahl's Law through EPI Throttling

Proceedings of the 32nd annual international symposium on Computer Architecture
The Thrifty Barrier: Energy-Aware Synchronization in Shared-Memory Multiprocessors

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Performance, Power Efficiency and Scalability of Asymmetric Cluster Chip Multiprocessors

IEEE Computer Architecture Letters
Spin Detection Hardware for Improved Management of Multithreaded Systems

IEEE Transactions on Parallel and Distributed Systems
Dynamic recognition of synchronization operations for improved data race detection

ISSTA '08 Proceedings of the 2008 international symposium on Software testing and analysis
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Meeting points: using thread criticality to adapt multicore hardware to parallel regions

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Validity of the single processor approach to achieving large scale computing capabilities

AFIPS '67 (Spring) Proceedings of the April 18-20, 1967, spring joint computer conference
Accelerating critical section execution with asymmetric multi-core architectures

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors

Proceedings of the 36th annual international symposium on Computer architecture
End-to-end performance forecasting: finding bottlenecks before they happen

Proceedings of the 36th annual international symposium on Computer architecture
Age based scheduling for asymmetric multiprocessors

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Rodinia: A benchmark suite for heterogeneous computing

IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Analyzing lock contention in multithreaded applications

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Modeling critical sections in Amdahl's law and its implications for multicore design

Proceedings of the 37th annual international symposium on Computer architecture
The gem5 simulator

ACM SIGARCH Computer Architecture News
Bottleneck identification and scheduling in multithreaded applications

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Booster: Reactive core acceleration for mitigating the effects of process variation and application imbalance in low-voltage chips

HPCA '12 Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture
Speedup stacks: Identifying scaling bottlenecks in multi-threaded applications

ISPASS '12 Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software
Critical lock analysis: diagnosing critical section bottlenecks in multithreaded applications

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Bottle graphs: visualizing scalability bottlenecks in multi-threaded applications

Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Fairness-aware scheduling on single-ISA heterogeneous multi-cores

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
The benefit of SMT in the multi-core era: flexibility towards degrees of thread-level parallelism

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Analyzing multi-threaded programs is quite challenging, but is necessary to obtain good multicore performance while saving energy. Due to synchronization, certain threads make others wait, because they hold a lock or have yet to reach a barrier. We call these critical threads, i.e., threads whose performance is determinative of program performance as a whole. Identifying these threads can reveal numerous optimization opportunities, for the software developer and for hardware. In this paper, we propose a new metric for assessing thread criticality, which combines both how much time a thread is performing useful work and how many co-running threads are waiting. We show how thread criticality can be calculated online with modest hardware additions and with low overhead. We use our metric to create criticality stacks that break total execution time into each thread's criticality component, allowing for easy visual analysis of parallel imbalance. To validate our criticality metric, and demonstrate it is better than previous metrics, we scale the frequency of the most critical thread and show it achieves the largest performance improvement. We then demonstrate the broad applicability of criticality stacks by using them to perform three types of optimizations: (1) program analysis to remove parallel bottlenecks, (2) dynamically identifying the most critical thread and accelerating it using frequency scaling to improve performance, and (3) showing that accelerating only the most critical thread allows for targeted energy reduction.