On the combination of hardware and software concurrency extraction methods

Authors:
Augustus K. Uht;Constantine D. Polychronopoulos;John F. Kolen
Affiliations:
University of California, San Diego, Dept. of Computer Science and Engineering, C-014, La Jolla, California and University of California at San Diego, and the Center for Supercomputing Research an ...;University of Illinois at Urbana-Champaign, Center for Supercomputing Research and Development, Urbana, Illinois;University of California, San Diego, Dept. of Computer Science and Engineering, C-014, La Jolla, California and Department of Computer and Information Science, Ohio State University, Columbus, Ohi ...
Venue:
MICRO 20 Proceedings of the 20th annual workshop on Microprogramming
Year:
1987

Citing 14
Cited 10

An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors

IEEE Transactions on Computers
HPSm, a high performance restricted data flow architecture having minimal functionality

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
HPS, a new microarchitecture: rationale and introduction

MICRO 18 Proceedings of the 18th annual workshop on Microprogramming
Processor Allocation for Horizontal and Vertical Parallelism and Related Speedup Bounds

IEEE Transactions on Computers
Guided self-scheduling: A practical scheduling scheme for parallel supercomputers

IEEE Transactions on Computers
Incremental performance contributions of hardware concurrency extraction techniques

Proceedings of the 1st International Conference on Supercomputing
Utilizing Multidimensional Loop Parallelism on Large Scale Parallel Processor Systems

IEEE Transactions on Computers
Look-Ahead Processors

ACM Computing Surveys (CSUR)
A Survey of Parallel Machine Organization and Programming

ACM Computing Surveys (CSUR)
Structure of Computers and Computations

Structure of Computers and Computations
Representation and detection of concurrency using ordering-matrices.

Representation and detection of concurrency using ordering-matrices.
Speedup of ordinary programs

Speedup of ordinary programs
Hardware extraction of low-level concurrency from sequential instruction streams (parallelism, implementation, architecture, dependencies, semantics)

Hardware extraction of low-level concurrency from sequential instruction streams (parallelism, implementation, architecture, dependencies, semantics)
On program restructuring, scheduling, and communication for parallel processor systems

On program restructuring, scheduling, and communication for parallel processor systems

Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design

IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
Requirements for optimal execution of oops with tests

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Modeling the effects of instruction queue loading on a static instruction stream micro-architecture

MICRO 21 Proceedings of the 21st annual workshop on Microprogramming and microarchitecture
A model for microarchitecture structure evaluation

MICRO 22 Proceedings of the 22nd annual workshop on Microprogramming and microarchitecture
A Theory of Reduced and Minimal Procedural Dependencies

IEEE Transactions on Computers
Comparing static and dynamic code scheduling for multiple-instruction-issue processors

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Concurrency Extraction Via Hardware Methods Executing the Static Instruction Stream

IEEE Transactions on Computers
Extraction of massive instruction level parallelism

ACM SIGARCH Computer Architecture News
Ideograph/Ideogram: framework/hardware for eager evaluation

MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
Requirements for Optimal Execution of Loops with Tests

IEEE Transactions on Parallel and Distributed Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

It has been shown that parallelism is a very promising alternative for enhancing computer performance. Parallelism, however, introduces much complexity to the programming effort. This has lead to the development of automatic concurrency extraction techniques. Prior work has demonstrated that static program restructuring via compiler based techniques provides a large degree of parallelism to the target machine. Purely hardware based extraction techniques (without software preprocessing) have also demonstrated significant (but lesser) degrees of parallelism. This paper considers the performance effects of the combination of both hardware and software techniques. The concurrency extracted from a given set of benchmarks by each technique separately, and together, is determined via simulations and/or analysis. The “common parallelism” extracted by the two methods is thus also considered, using new metrics. The analytic techniques for predicting the performance of specific programs are also described.