ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Highly concurrent scalar processing
Highly concurrent scalar processing
Critical issues regarding HPS, a high performance microarchitecture
MICRO 18 Proceedings of the 18th annual workshop on Microprogramming
Instruction issue logic for high-performance, interruptable pipelined processors
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
A VLIW architecture for a trace Scheduling Compiler
IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
Limits on multiple instruction issue
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Comparing software and hardware schemes for reducing the cost of branches
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
An efficient method of computing static single assignment form
POPL '89 Proceedings of the 16th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Limits of instruction-level parallelism
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Global instruction scheduling for superscalar machines
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Boosting beyond static scheduling in a superscalar processor
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Implementation of precise interrupts in pipelined processors
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
On the limits of program parallelism and its smoothability
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Extraction of massive instruction level parallelism
ACM SIGARCH Computer Architecture News
A partial evaluator for data flow graphs
PEPM '93 Proceedings of the 1993 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation
SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Speculative execution and branch prediction on parallel machines
ICS '93 Proceedings of the 7th international conference on Supercomputing
Reducing indirect function call overhead in C++ programs
POPL '94 Proceedings of the 21st ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Height reduction of control recurrences for ILP processors
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Theoretical modeling of superscalar processor performance
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Improving the accuracy of static branch prediction using branch correlation
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
Evaluating Performance Tradeoffs Between Fine-Grained and Coarse-Grained Alternatives
IEEE Transactions on Parallel and Distributed Systems
Unconstrained speculative execution with predicated state buffering
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A macrotask-level unlimited speculative execution on multiprocessors
ICS '95 Proceedings of the 9th international conference on Supercomputing
Ordered multithreading: a novel technique for exploiting thread-level parallelism
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Increasing superscalar performance through multistreaming
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
The influence of branch prediction table interference on branch prediction scheme performance
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Disjoint eager execution: an optimal form of speculative execution
Proceedings of the 28th annual international symposium on Microarchitecture
Exceeding the dataflow limit via value prediction
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
The performance potential of data dependence speculation & collapsing
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
GPMB—software pipelining branch-intensive loops
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
A study on the number of memory ports in multiple instruction issue machines
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
The 16-fold way: a microparallel taxonomy
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Measuring limits of parallelism and characterizing its vulnerability to resource constraints
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Control flow prediction for dynamic ILP processors
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
An analysis of dynamic scheduling techniques for symbolic applications
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
From algorithm parallelism to instruction-level parallelism: an encode-decode chain using prefix-sum
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences
Proceedings of the 24th annual international symposium on Computer architecture
Evaluation of scheduling techniques on a SPARC-based VLIW testbed
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Tuning compiler optimizations for simultaneous multithreading
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Highly accurate data value prediction using hybrid predictors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Available paralellism in video applications
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The potential of data value speculation to boost ILP
ICS '98 Proceedings of the 12th international conference on Supercomputing
Speculative execution model with duplication
ICS '98 Proceedings of the 12th international conference on Supercomputing
Simultaneous multithreading: maximizing on-chip parallelism
25 years of the international symposia on Computer architecture (selected papers)
Load latency tolerance in dynamically scheduled processors
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Space-time scheduling of instruction-level parallelism on a raw machine
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Data speculation support for a chip multiprocessor
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Dynamic vectorization: a mechanism for exploiting far-flung ILP in ordinary programs
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Reducing branch misprediction penalties via dynamic control independence detection
ICS '99 Proceedings of the 13th international conference on Supercomputing
A comparison of scalable superscalar processors
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
The Superthreaded Processor Architecture
IEEE Transactions on Computers
Control independence in trace processors
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Optimizations and oracle parallelism with dynamic translation
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
IEEE Transactions on Computers
A code-motion pruning technique for global scheduling
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Understanding the backward slices of performance degrading instructions
Proceedings of the 27th annual international symposium on Computer architecture
Tuning Compiler Optimizations for Simultaneous Multithreading
International Journal of Parallel Programming - Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part II
OS and compiler considerations in the design of the IA-64 architecture
ACM SIGPLAN Notices
A study of slipstream processors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Towards a first vertical prototyping of an extremely fine-grained parallel programming approach
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
OS and compiler considerations in the design of the IA-64 architecture
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Scheduled Dataflow: Execution Paradigm, Architecture, and Performance Evaluation
IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Disjoint Eager Execution: what it is / what it is not
ACM SIGARCH Computer Architecture News
On the Boosting of Instruction Scheduling by Renaming
The Journal of Supercomputing
Sensitivity analysis of a superscalar processor model
CRPIT '02 Proceedings of the seventh Asia-Pacific conference on Computer systems architecture
Architectural differences of efficient sequential and parallel computers
Journal of Systems Architecture: the EUROMICRO Journal
Exploiting Value Locality to Exceed the Dataflow Limit
International Journal of Parallel Programming
Branch Effect Reduction Techniques
Computer
A survey of processors with explicit multithreading
ACM Computing Surveys (CSUR)
A finite state machine based format model of software pipelined loops with conditions
Progress in computer research
Multiscalar Execution along a Single Flow of Control
ICPP '97 Proceedings of the international Conference on Parallel Processing
A Feasibility Study of Hierarchical Multithreading
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Design and Evaluation of Speculative Multi-threading with Selective Multi-Path Execution
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Limits of Task-Based Parallelism in Irregular Applications
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Limits and Graph Structure of Available Instruction-Level Parallelism (Research Note)
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Realizing High IPC Using Time-Tagged Resource-Flow Computing
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
A Fine-Grain Threaded Abstract Machine
PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
A PDG-based Tool and its Use in Analyzing Program Control Dependences
PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Selective Scheduling Framework for Speculative Operations in VLIW and Superscalar Processors
PACT '93 Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism
Realizing high IPC through a scalable memory-latency tolerant multipath microarchitecture
ACM SIGARCH Computer Architecture News
Exploring Microprocessor Architectures for Gigascale Integration
ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
The Ultrascalar Processor-An Asymptotically Scalable Superscalar Microarchitecture
ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
Instruction-level parallel processors-dynamic and static scheduling tradeoffs
PAS '97 Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis
A Constructive Method for Exploiting Code Motion
ISSS '96 Proceedings of the 9th international symposium on System synthesis
A Quantitative Code Analysis of Scientific Systolic Programs: DSP vs. Matrix Algorithms
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Balancing Reuse Opportunities and Performance Gains with Subblock Value Reuse
IEEE Transactions on Computers
Programming skills for a changing world: back to the basics
Journal of Computing Sciences in Colleges
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Instruction Scheduling for Low Power
Journal of VLSI Signal Processing Systems
Specialized Dynamic Optimizations for High-Performance Energy-Efficient Microarchitecture
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Characterizing a new class of threads in scientific applications for high end supercomputers
Proceedings of the 18th annual international conference on Supercomputing
Power Awareness through Selective Dynamically Optimized Traces
Proceedings of the 31st annual international symposium on Computer architecture
A scalable, clustered SMT processor for digital signal processing
MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Area and System Clock Effects on SMT/CMP Throughput
IEEE Transactions on Computers
The Potential of Computation Regrouping for Improving Locality
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Control-Flow Independence Reuse via Dynamic Vectorization
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Chip multi-processor scalability for single-threaded applications
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Proceedings of the 33rd annual international symposium on Computer Architecture
Code transformation strategies for extensible embedded processors
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Hardware support for software controlled multithreading
ACM SIGARCH Computer Architecture News
Ginger: control independence using tag rewriting
Proceedings of the 34th annual international symposium on Computer architecture
Inter-cluster communication in VLIW architectures
ACM Transactions on Architecture and Code Optimization (TACO)
A hybrid closed queuing network approach to model dataflow in networked distributed processors
Computer Communications
Modeling optimistic concurrency using quantitative dependence analysis
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Communication optimizations for global multi-threaded instruction scheduling
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Communications of the ACM - Web science
Journal of Signal Processing Systems - Special Issue: Embedded computing systems for DSP
A closed queuing network model with multiple servers for multi-threaded architecture
Computer Communications
Visualizing potential parallelism in sequential programs
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
A hybrid open queuing network model approach for multi-threaded dataflow architecture
Computer Communications
SPARTAN: A software tool for Parallelization Bottleneck Analysis
IWMSE '09 Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering
A hybrid closed queuing network model for multi-threaded dataflow architecture
Computers and Electrical Engineering
The potential of using dynamic information flow analysis in data value prediction
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
EURO-PDP'00 Proceedings of the 8th Euromicro conference on Parallel and distributed processing
Parallelism and data movement characterization of contemporary application classes
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Proceedings of the 8th ACM International Conference on Computing Frontiers
Kismet: parallel speedup estimates for serial programs
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Leveraging Strength-Based Dynamic Information Flow Analysis to Enhance Data Value Prediction
ACM Transactions on Architecture and Code Optimization (TACO)
Limits of parallelism using dynamic dependency graphs
WODA '09 Proceedings of the Seventh International Workshop on Dynamic Analysis
PARROT: power awareness through selective dynamically optimized traces
PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
SAMOS'06 Proceedings of the 6th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Dynamic trace-based analysis of vectorization potential of applications
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Disjoint out-of-order execution processor
ACM Transactions on Architecture and Code Optimization (TACO)
Automatic extraction of multi-objective aware pipeline parallelism using genetic algorithms
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Exposing ILP in custom hardware with a dataflow compiler IR
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.01 |
This paper discusses three techniques useful in relaxing the constraints imposed by control flow on parallelism: control dependence analysis, executing multiple flows of control simultaneously, and speculative execution. We evaluate these techniques by using trace simulations to find the limits of parallelism for machines that employ different combinations of these techniques. We have three major results. First, local regions of code have limited parallelism, and control dependence analysis is useful in extracting global parallelism from different parts of a program. Second, a superscalar processor is fundamentally limited because it cannot execute independent regions of code concurrently. Higher performance can be obtained with machines, such as multiprocessors and dataflow machines, that can simultaneously follow multiple flows of control. Finally, without speculative execution to allow instructions to execute before their control dependences are resolved, only modest amounts of parallelism can be obtained for programs with complex control flow.