Evaluating Performance Tradeoffs Between Fine-Grained and Coarse-Grained Alternatives

Authors:
Pradeep K. Dubey;George B. Adams, III;Michael J. Flynn
Affiliations:
IBM T. J. Watson Research Center, Yorktown Heights, NY;Purdue Univ., West Lafayette, IN;Stanford Univ., Stanford, CA
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1995

Citing 11
Cited 1

An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors

IEEE Transactions on Computers
The performance potential of multiple functional unit processors

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Available instruction-level parallelism for superscalar and superpipelined machines

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Limits on multiple instruction issue

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Optimal pipelining

Journal of Parallel and Distributed Computing
Limits of instruction-level parallelism

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Single instruction stream parallelism is greater than two

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Limits of control flow on parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Optimizing delayed branches

MICRO 15 Proceedings of the 15th annual workshop on Microprogramming
Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing

MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
On program restructuring, scheduling, and communication for parallel processor systems

On program restructuring, scheduling, and communication for parallel processor systems

Architectural differences of efficient sequential and parallel computers

Journal of Systems Architecture: the EUROMICRO Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent simulation based studies suggest that while superpipelines and superscalars are equally capable of exploiting fine grained concurrency, multiprocessors are better at exploiting coarse grained parallelism. An analytical model that is more flexible and less costly in terms of run time than simulation, is proposed as a tool for analyzing the tradeoff between superpipelined processors, superscalar processors, and multiprocessors. The duality of superpipelines and superscalars is examined in detail. The performance limit for these systems has been derived and it supports the fetch bottleneck observation of previous researchers. Common characteristics of utilization curves for such systems are examined. Combined systems, such as superpipelined multiprocessors and superscalar multiprocessors, are also analyzed. The model shows that the number of pipelines (or processors) at which the maximum throughput is obtained is, as memory access time increases, increasingly sensitive to the ratio of memory access time to network access delay. Further, as a function of interiteration dependence distance, optimum throughput is shown to vary nonlinearly, whereas the corresponding optimum number of processors varies linearly. The predictions from the analytical model agree with similar results published using simulation based techniques