Removing architectural bottlenecks to the scalability of speculative parallelization

Authors:
Milos Prvulovic;María Jesús Garzarán;Lawrence Rauchwerger;Josep Torrellas
Affiliations:
University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;Texas A&M University;University of Illinois at Urbana-Champaign
Venue:
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Year:
2001

Citing 19
Cited 22

The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A dynamic multithreading processor

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Data speculation support for a chip multiprocessor

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Clustered speculative multithreaded processors

ICS '99 Proceedings of the 13th international conference on Supercomputing
A Chip-Multiprocessor Architecture with Speculative Multithreading

IEEE Transactions on Computers
The Superthreaded Processor Architecture

IEEE Transactions on Computers
An architecture for mostly functional languages

LFP '86 Proceedings of the 1986 ACM conference on LISP and functional programming
The directory-based cache coherence protocol for the DASH multiprocessor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
A scalable approach to thread-level speculation

Proceedings of the 27th annual international symposium on Computer architecture
Architectural support for scalable speculative parallelization in shared-memory multiprocessors

Proceedings of the 27th annual international symposium on Computer architecture
Techniques for speculative run-time parallelization of loops

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Parallel Programming with Polaris

Computer
SPEC CPU2000: Measuring CPU Performance in the New Millennium

Computer
MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors

MASCOTS '94 Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems
Speculative Versioning Cache

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Hardware for Speculative Run-Time Parallelization in Distributed Shared-Memory Multiprocessors

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
An Direct-Execution Framework for Fast and Accurate Simulation of Superscalar Processors

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
In Search of Speculative Thread-Level Parallelism

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques

Tradeoffs in Buffering Memory State for Thread-Level Speculation in Multiprocessors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
ReEnact: using thread-level speculation mechanisms to debug data races in multithreaded codes

Proceedings of the 30th annual international symposium on Computer architecture
Virtualizing Transactional Memory

Proceedings of the 32nd annual international symposium on Computer Architecture
Design Space Exploration of a Software Speculative Parallelization Scheme

IEEE Transactions on Parallel and Distributed Systems
The STAMPede approach to thread-level speculation

ACM Transactions on Computer Systems (TOCS)
Thread-Level Speculation on a CMP can be energy efficient

Proceedings of the 19th annual international conference on Supercomputing
Tradeoffs in buffering speculative memory state for thread-level speculation in multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
Unbounded Transactional Memory

IEEE Micro
Energy-Efficient Thread-Level Speculation

IEEE Micro
Hybrid transactional memory

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Tolerating Dependences Between Large Speculative Threads Via Sub-Threads

Proceedings of the 33rd annual international symposium on Computer Architecture
PathExpander: Architectural Support for Increasing the Path Coverage of Dynamic Bug Detection

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Incrementally parallelizing database transactions with thread-level speculation

ACM Transactions on Computer Systems (TOCS)
Scalable and reliable communication for hardware transactional memory

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Dynamic parallelization of single-threaded binary programs using speculative slicing

Proceedings of the 23rd international conference on Supercomputing
Boosting single-thread performance in multi-core systems through fine-grain multi-threading

Proceedings of the 36th annual international symposium on Computer architecture
Supporting speculative parallelization in the presence of dynamic data structures

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Speculative parallelization using state separation and multiple value prediction

Proceedings of the 2010 international symposium on Memory management
Enhanced speculative parallelization via incremental recovery

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Hardware support for multithreaded execution of loops with limited parallelism

PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
Speculative parallelization: eliminating the overhead of failure

HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
IBM Blue Gene/Q memory subsystem with speculative execution and transactional memory

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

Speculative thread-level parallelization is a promising way to speed up codes that compilers fail to parallelize. While several speculative parallelization schemes have been proposed for different machine sizes and types of codes, the results so far show that it is hard to deliver scalable speedups. Often, the problem is not true dependence violations, but sub-optimal architectural design. Consequently, we attempt to identify and eliminate major architectural bottlenecks that limit the scalability of speculative parallelization. The solutions that we propose are: low-complexity commit in constant time to eliminate the task commit bottleneck, a memory-based overflow area to eliminate stall due to speculative buffer overflow, and exploiting high-level access patterns to minimize speculation-induced traffic. To show that the resulting system is truly scalable, we perform simulations with up to 128 processors. With our optimizations, the speedups for 128 and 64 processors reach 63 and 48, respectively. The average speedup for 64 processors is 32, nearly four times higher than without our optimizations.