Limits of parallelism using dynamic dependency graphs

Authors:
Jonathan Mak;Alan Mycroft
Affiliations:
University of Cambridge Computer Laboratory, Cambridge, United Kingdom;University of Cambridge Computer Laboratory, Cambridge, United Kingdom
Venue:
WODA '09 Proceedings of the Seventh International Workshop on Dynamic Analysis
Year:
2009

Citing 17
Cited 6

The program dependence graph and its use in optimization

ACM Transactions on Programming Languages and Systems (TOPLAS)
Limits of instruction-level parallelism

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Limits of control flow on parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Dynamic dependency analysis of ordinary programs

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Value locality and load value prediction

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Lazy threads: implementing a fast parallel call

Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
A study of branch prediction strategies

25 years of the international symposia on Computer architecture (selected papers)
The limits of instruction level parallelism in SPEC95 applications

ACM SIGARCH Computer Architecture News - Special issue on Interact-3 workshop
A scalable approach to thread-level speculation

Proceedings of the 27th annual international symposium on Computer architecture
Dhrystone: a synthetic systems programming benchmark

Communications of the ACM
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Polaris: Improving the Effectiveness of Parallelizing Compilers

LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
Limits and Graph Structure of Available Instruction-Level Parallelism (Research Note)

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Automatic Thread Extraction with Decoupled Software Pipelining

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
QEMU, a fast and portable dynamic translator

ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Global Multi-Threaded Instruction Scheduling

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture

Towards an adaptable multiple-ISA reconfigurable processor

ARC'11 Proceedings of the 7th international conference on Reconfigurable computing: architectures, tools and applications
Boosting parallel applications performance on applying DIM technique in a multiprocessing environment

International Journal of Reconfigurable Computing - Special issue on selected papers from the 17th reconfigurable architectures workshop (RAW2010)
Dynamic trace-based analysis of vectorization potential of applications

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Mixing static and dynamic strategies for high performance and low area reconfigurable systems

International Journal of High Performance Systems Architecture
Towards a multiple-ISA embedded system

Journal of Systems Architecture: the EUROMICRO Journal
Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The advance of multi-core processors has led to renewed interest in extracting parallelism from programs. It is sometimes useful to know how much parallelism is exploitable in the limit for general programs, to put into perspective the speedups of various parallelisation techniques. Wall's study [19] was one of the first to examine limits of parallelism in detail. We present an extension of Wall's analysis of limits of parallelism, by constructing Dynamic Dependency Graphs from execution traces of a number of benchmark programs, allowing us better visualisation of the types of dependencies which limit parallelism, as well as flexibility in transforming graphs when exploring possible optimisations. Some of the results of Wall and subsequent studies are confirmed, including the fact that average available parallelism is often above 100, but requires effective measures to resolve control dependencies, as well as memory renaming. We also study how certain compiler artifacts affect the limits of parallelism. In particular we show that the use of a spaghetti stack, as a technique to implicitly rename stack memory and break chains on true dependencies on the stack pointer, can lead to a doubling of potential parallelism.