Limits of control flow on parallelism

Authors:
Monica S. Lam;Robert P. Wilson
Affiliations:
-;-
Venue:
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Year:
1992

Citing 14
Cited 122

Reducing the cost of branches

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Highly concurrent scalar processing

Highly concurrent scalar processing
Critical issues regarding HPS, a high performance microarchitecture

MICRO 18 Proceedings of the 18th annual workshop on Microprogramming
Instruction issue logic for high-performance, interruptable pipelined processors

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
A VLIW architecture for a trace Scheduling Compiler

IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
Limits on multiple instruction issue

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
SIMP (Single Instruction stream/Multiple instruction Pipelining): a novel high-speed single-processor architecture

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
SIMP (Single Instruction stream/Multiple instruction Pipelining): a novel high-speed single-processor architecture

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Comparing software and hardware schemes for reducing the cost of branches

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
An efficient method of computing static single assignment form

POPL '89 Proceedings of the 16th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Limits of instruction-level parallelism

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Global instruction scheduling for superscalar machines

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Boosting beyond static scheduling in a superscalar processor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Implementation of precise interrupts in pipelined processors

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture

On the limits of program parallelism and its smoothability

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Extraction of massive instruction level parallelism

ACM SIGARCH Computer Architecture News
A partial evaluator for data flow graphs

PEPM '93 Proceedings of the 1993 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation
Shared memory consistency conditions for non-sequential execution: definitions and programming strategies

SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Speculative execution and branch prediction on parallel machines

ICS '93 Proceedings of the 7th international conference on Supercomputing
Reducing indirect function call overhead in C++ programs

POPL '94 Proceedings of the 21st ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Height reduction of control recurrences for ILP processors

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Theoretical modeling of superscalar processor performance

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Improving the accuracy of static branch prediction using branch correlation

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Compiler transformations for high-performance computing

ACM Computing Surveys (CSUR)
Evaluating Performance Tradeoffs Between Fine-Grained and Coarse-Grained Alternatives

IEEE Transactions on Parallel and Distributed Systems
Unconstrained speculative execution with predicated state buffering

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A macrotask-level unlimited speculative execution on multiprocessors

ICS '95 Proceedings of the 9th international conference on Supercomputing
Ordered multithreading: a novel technique for exploiting thread-level parallelism

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Increasing superscalar performance through multistreaming

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
The influence of branch prediction table interference on branch prediction scheme performance

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Single-program speculative multithreading (SPSM) architecture: compiler-assisted fine-grained multithreading

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Disjoint eager execution: an optimal form of speculative execution

Proceedings of the 28th annual international symposium on Microarchitecture
Exceeding the dataflow limit via value prediction

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
The performance potential of data dependence speculation & collapsing

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
GPMB—software pipelining branch-intensive loops

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
A study on the number of memory ports in multiple instruction issue machines

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
The 16-fold way: a microparallel taxonomy

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Measuring limits of parallelism and characterizing its vulnerability to resource constraints

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Control flow prediction for dynamic ILP processors

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
An analysis of dynamic scheduling techniques for symbolic applications

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
From algorithm parallelism to instruction-level parallelism: an encode-decode chain using prefix-sum

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences

Proceedings of the 24th annual international symposium on Computer architecture
Evaluation of scheduling techniques on a SPARC-based VLIW testbed

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Tuning compiler optimizations for simultaneous multithreading

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Trace processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Highly accurate data value prediction using hybrid predictors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Available paralellism in video applications

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The potential of data value speculation to boost ILP

ICS '98 Proceedings of the 12th international conference on Supercomputing
Speculative execution model with duplication

ICS '98 Proceedings of the 12th international conference on Supercomputing
Simultaneous multithreading: maximizing on-chip parallelism

25 years of the international symposia on Computer architecture (selected papers)
Load latency tolerance in dynamically scheduled processors

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Space-time scheduling of instruction-level parallelism on a raw machine

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Data speculation support for a chip multiprocessor

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Dynamic vectorization: a mechanism for exploiting far-flung ILP in ordinary programs

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Reducing branch misprediction penalties via dynamic control independence detection

ICS '99 Proceedings of the 13th international conference on Supercomputing
A comparison of scalable superscalar processors

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
The Superthreaded Processor Architecture

IEEE Transactions on Computers
Control independence in trace processors

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Optimizations and oracle parallelism with dynamic translation

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Branch Prediction, Instruction-Window Size, and Cache Size: Performance Trade-Offs and Simulation Techniques

IEEE Transactions on Computers
A code-motion pruning technique for global scheduling

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Understanding the backward slices of performance degrading instructions

Proceedings of the 27th annual international symposium on Computer architecture
Tuning Compiler Optimizations for Simultaneous Multithreading

International Journal of Parallel Programming - Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part II
OS and compiler considerations in the design of the IA-64 architecture

ACM SIGPLAN Notices
A study of slipstream processors

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Towards a first vertical prototyping of an extremely fine-grained parallel programming approach

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
OS and compiler considerations in the design of the IA-64 architecture

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Scheduled Dataflow: Execution Paradigm, Architecture, and Performance Evaluation

IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Disjoint Eager Execution: what it is / what it is not

ACM SIGARCH Computer Architecture News
On the Boosting of Instruction Scheduling by Renaming

The Journal of Supercomputing
Sensitivity analysis of a superscalar processor model

CRPIT '02 Proceedings of the seventh Asia-Pacific conference on Computer systems architecture
Architectural differences of efficient sequential and parallel computers

Journal of Systems Architecture: the EUROMICRO Journal
Analysis of Worst Case DMA Response Time in a Fixed-Priority Bus Arbitration Protocol

Real-Time Systems
Exploiting Value Locality to Exceed the Dataflow Limit

International Journal of Parallel Programming
Branch Effect Reduction Techniques

Computer
A survey of processors with explicit multithreading

ACM Computing Surveys (CSUR)
A finite state machine based format model of software pipelined loops with conditions

Progress in computer research
Multiscalar Execution along a Single Flow of Control

ICPP '97 Proceedings of the international Conference on Parallel Processing
A Feasibility Study of Hierarchical Multithreading

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Design and Evaluation of Speculative Multi-threading with Selective Multi-Path Execution

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Limits of Task-Based Parallelism in Irregular Applications

ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Limits and Graph Structure of Available Instruction-Level Parallelism (Research Note)

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Realizing High IPC Using Time-Tagged Resource-Flow Computing

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
A Fine-Grain Threaded Abstract Machine

PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
A PDG-based Tool and its Use in Analyzing Program Control Dependences

PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Selective Scheduling Framework for Speculative Operations in VLIW and Superscalar Processors

PACT '93 Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism
Realizing high IPC through a scalable memory-latency tolerant multipath microarchitecture

ACM SIGARCH Computer Architecture News
Exploring Microprocessor Architectures for Gigascale Integration

ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
The Ultrascalar Processor-An Asymptotically Scalable Superscalar Microarchitecture

ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
Instruction-level parallel processors-dynamic and static scheduling tradeoffs

PAS '97 Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis
A Constructive Method for Exploiting Code Motion

ISSS '96 Proceedings of the 9th international symposium on System synthesis
A Quantitative Code Analysis of Scientific Systolic Programs: DSP vs. Matrix Algorithms

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Balancing Reuse Opportunities and Performance Gains with Subblock Value Reuse

IEEE Transactions on Computers
Programming skills for a changing world: back to the basics

Journal of Computing Sciences in Colleges
WaveScalar

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Instruction Scheduling for Low Power

Journal of VLSI Signal Processing Systems
Specialized Dynamic Optimizations for High-Performance Energy-Efficient Microarchitecture

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Characterizing a new class of threads in scientific applications for high end supercomputers

Proceedings of the 18th annual international conference on Supercomputing
Power Awareness through Selective Dynamically Optimized Traces

Proceedings of the 31st annual international symposium on Computer architecture
A scalable, clustered SMT processor for digital signal processing

MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
Spatial computation

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Area and System Clock Effects on SMT/CMP Throughput

IEEE Transactions on Computers
The Potential of Computation Regrouping for Improving Locality

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Control-Flow Independence Reuse via Dynamic Vectorization

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Chip multi-processor scalability for single-threaded applications

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Program Demultiplexing: Data-flow based Speculative Parallelization of Methods in Sequential Programs

Proceedings of the 33rd annual international symposium on Computer Architecture
Code transformation strategies for extensible embedded processors

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Diverge-Merge Processor (DMP): Dynamic Predicated Execution of Complex Control-Flow Graphs Based on Frequently Executed Paths

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Hardware support for software controlled multithreading

ACM SIGARCH Computer Architecture News
Ginger: control independence using tag rewriting

Proceedings of the 34th annual international symposium on Computer architecture
Inter-cluster communication in VLIW architectures

ACM Transactions on Architecture and Code Optimization (TACO)
A hybrid closed queuing network approach to model dataflow in networked distributed processors

Computer Communications
Modeling optimistic concurrency using quantitative dependence analysis

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Communication optimizations for global multi-threaded instruction scheduling

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
The revolution inside the box

Communications of the ACM - Web science
A distributed, simultaneously multi-threaded (SMT) processor with clustered scheduling windows for scalable DSP performance

Journal of Signal Processing Systems - Special Issue: Embedded computing systems for DSP
A closed queuing network model with multiple servers for multi-threaded architecture

Computer Communications
Visualizing potential parallelism in sequential programs

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
A hybrid open queuing network model approach for multi-threaded dataflow architecture

Computer Communications
SPARTAN: A software tool for Parallelization Bottleneck Analysis

IWMSE '09 Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering
A hybrid closed queuing network model for multi-threaded dataflow architecture

Computers and Electrical Engineering
The potential of using dynamic information flow analysis in data value prediction

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
A case study of trace-driven simulation for analyzing interconnection networks: cc-NUMAs with ILP processors

EURO-PDP'00 Proceedings of the 8th Euromicro conference on Parallel and distributed processing
Parallelism and data movement characterization of contemporary application classes

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Quantitative analysis of parallelism and data movement properties across the Berkeley computational motifs

Proceedings of the 8th ACM International Conference on Computing Frontiers
Kismet: parallel speedup estimates for serial programs

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Leveraging Strength-Based Dynamic Information Flow Analysis to Enhance Data Value Prediction

ACM Transactions on Architecture and Code Optimization (TACO)
Limits of parallelism using dynamic dependency graphs

WODA '09 Proceedings of the Seventh International Workshop on Dynamic Analysis
PARROT: power awareness through selective dynamically optimized traces

PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
A scalable, multi-thread, multi-issue array processor architecture for DSP applications based on extended tomasulo scheme

SAMOS'06 Proceedings of the 6th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Dynamic trace-based analysis of vectorization potential of applications

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Disjoint out-of-order execution processor

ACM Transactions on Architecture and Code Optimization (TACO)
Automatic extraction of multi-objective aware pipeline parallelism using genetic algorithms

Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Exposing ILP in custom hardware with a dataflow compiler IR

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper discusses three techniques useful in relaxing the constraints imposed by control flow on parallelism: control dependence analysis, executing multiple flows of control simultaneously, and speculative execution. We evaluate these techniques by using trace simulations to find the limits of parallelism for machines that employ different combinations of these techniques. We have three major results. First, local regions of code have limited parallelism, and control dependence analysis is useful in extracting global parallelism from different parts of a program. Second, a superscalar processor is fundamentally limited because it cannot execute independent regions of code concurrently. Higher performance can be obtained with machines, such as multiprocessors and dataflow machines, that can simultaneously follow multiple flows of control. Finally, without speculative execution to allow instructions to execute before their control dependences are resolved, only modest amounts of parallelism can be obtained for programs with complex control flow.