Bottleneck identification and scheduling in multithreaded applications

Authors:
José A. Joao;M. Aater Suleman;Onur Mutlu;Yale N. Patt
Affiliations:
The University of Texas at Austin, Austin, TX, USA;Calxeda Inc., Austin, TX, USA;Carnegie Mellon University, Pittsburgh, PA, USA;The University of Texas at Austin, Austin, TX, USA
Venue:
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Year:
2012

Citing 26
Cited 11

Synchronization without contention

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Transactional memory: architectural support for lock-free data structures

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
An online computation of critical path profiling

SPDT '96 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Scalable high speed IP routing lookups

SIGCOMM '97 Proceedings of the ACM SIGCOMM '97 conference on Applications, technologies, architectures, and protocols for computer communication
Focusing processor policies via critical-path prediction

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Slack: maximizing performance under technological constraints

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Speculative lock elision: enabling highly concurrent multithreaded execution

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Transactional lock-free execution of lock-based programs

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Implementation and Evaluation of OpenMP for Hitachi SR8000

ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Quantifying instruction criticality for shared memory multiprocessors

Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures
Dynamic decentralized cache schemes for mimd parallel processors

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Mitigating Amdahl's Law through EPI Throttling

Proceedings of the 32nd annual international symposium on Computer Architecture
Heterogeneous Chip Multiprocessors

Computer
Performance, Power Efficiency and Scalability of Asymmetric Cluster Chip Multiprocessors

IEEE Computer Architecture Letters
Ablego: a function outlining and partial inlining framework: Research Articles

Software—Practice & Experience
Meeting points: using thread criticality to adapt multicore hardware to parallel regions

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Validity of the single processor approach to achieving large scale computing capabilities

AFIPS '67 (Spring) Proceedings of the April 18-20, 1967, spring joint computer conference
Accelerating critical section execution with asymmetric multi-core architectures

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors

Proceedings of the 36th annual international symposium on Computer architecture
Flextream: Adaptive Compilation of Streaming Applications for Heterogeneous Architectures

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Age based scheduling for asymmetric multiprocessors

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Bias scheduling in heterogeneous multi-core architectures

Proceedings of the 5th European conference on Computer systems
A comprehensive scheduler for asymmetric multicore systems

Proceedings of the 5th European conference on Computer systems
Data marshaling for multi-core architectures

Proceedings of the 37th annual international symposium on Computer architecture
Feedback-directed pipeline parallelism

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Parallel application memory scheduling

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture

Trace-driven simulation of memory system scheduling in multithread application

Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Composite Cores: Pushing Heterogeneity Into a Core

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Predicting Coherence Communication by Tracking Synchronization Points at Run Time

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Dynamic Acceleration of Multithreaded Program Critical Paths in Near-Threshold Systems

MICROW '12 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture Workshops
CMP off-chip bandwidth scheduling guided by instruction criticality

Proceedings of the 27th international ACM conference on International conference on supercomputing
Utility-based acceleration of multithreaded applications on asymmetric CMPs

Proceedings of the 40th Annual International Symposium on Computer Architecture
Criticality stacks: identifying critical threads in parallel programs using synchronization behavior

Proceedings of the 40th Annual International Symposium on Computer Architecture
Fairness-aware scheduling on single-ISA heterogeneous multi-cores

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
The benefit of SMT in the multi-core era: flexibility towards degrees of thread-level parallelism

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
PAIS: Parallelism-aware interconnect scheduling in multicores

ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Adaptive workload-aware task scheduling for single-ISA asymmetric multicore architectures

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Performance of multithreaded applications is limited by a variety of bottlenecks, e.g. critical sections, barriers and slow pipeline stages. These bottlenecks serialize execution, waste valuable execution cycles, and limit scalability of applications. This paper proposes Bottleneck Identification and Scheduling in Multithreaded Applications (BIS), a cooperative software-hardware mechanism to identify and accelerate the most critical bottlenecks. BIS identifies which bottlenecks are likely to reduce performance by measuring the number of cycles threads have to wait for each bottleneck, and accelerates those bottlenecks using one or more fast cores on an Asymmetric Chip Multi-Processor (ACMP). Unlike previous work that targets specific bottlenecks, BIS can identify and accelerate bottlenecks regardless of their type. We compare BIS to four previous approaches and show that it outperforms the best of them by 15% on average. BIS' performance improvement increases as the number of cores and the number of fast cores in the system increase.