Runtime parallelization of legacy code on a transactional memory system

Authors:
Matthew DeVuyst;Dean M. Tullsen;Seon Wook Kim
Affiliations:
University of California, San Diego;University of California, San Diego;Korea University
Venue:
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Year:
2011

Citing 20
Cited 2

Transactional memory: architectural support for lock-free data structures

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Idiom recognition in the Polaris parallelizing compiler

ICS '95 Proceedings of the 9th international conference on Supercomputing
Software transactional memory

Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing
Speculative multithreaded processors

ICS '98 Proceedings of the 12th international conference on Supercomputing
Advanced compiler design and implementation

Advanced compiler design and implementation
A Chip-Multiprocessor Architecture with Speculative Multithreading

IEEE Transactions on Computers
The Superthreaded Processor Architecture

IEEE Transactions on Computers
Architectural support for scalable speculative parallelization in shared-memory multiprocessors

Proceedings of the 27th annual international symposium on Computer architecture
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
A low-overhead coherence solution for multiprocessors with private cache memories

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Preliminary Evaluation of a Binary Translation System for Multithreaded Processors

IWIA '02 Proceedings of the International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'02)
Transactional Memory Coherence and Consistency

Proceedings of the 31st annual international symposium on Computer architecture
An Event-Driven Multithreaded Dynamic Optimization Framework

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
A Self-Repairing Prefetcher in an Event-Driven Dynamic Optimization Framework

Proceedings of the International Symposium on Code Generation and Optimization
Bulk Disambiguation of Speculative Threads in Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
Implicit parallelism with ordered transactions

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Speculative Decoupled Software Pipelining

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Early experience with a commercial hardware transactional memory implementation

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Mapping Out a Path from Hardware Transactional Memory to Speculative Multithreading

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques

Polyhedral parallelization of binary code

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Limits of region-based dynamic binary parallelization

Proceedings of the 9th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a new runtime parallelization technique, based on a dynamic optimization framework, to automatically parallelize single-threaded legacy programs. It heavily leverages the optimistic concurrency of transactional memory. This work addresses a number of challenges posed by this type of parallelization and quantifies the trade-offs of some of the design decisions, such as how to select good loops for parallelization, how to partition the iteration space among parallel threads, how to handle loop-carried dependencies, and how to transition from serial to parallel execution and back. The simulated implementation of runtime parallelization shows a potential speedup of 1.36 for the NAS benchmarks and a 1.34 speedup for the SPEC 2000 CPU floating point benchmarks when using two cores for parallel execution.