An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors
IEEE Transactions on Computers
The performance potential of multiple functional unit processors
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Available instruction-level parallelism for superscalar and superpipelined machines
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Limits on multiple instruction issue
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Journal of Parallel and Distributed Computing
Limits of instruction-level parallelism
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Single instruction stream parallelism is greater than two
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Limits of control flow on parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
MICRO 15 Proceedings of the 15th annual workshop on Microprogramming
MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
On program restructuring, scheduling, and communication for parallel processor systems
On program restructuring, scheduling, and communication for parallel processor systems
Architectural differences of efficient sequential and parallel computers
Journal of Systems Architecture: the EUROMICRO Journal
Hi-index | 0.00 |
Recent simulation based studies suggest that while superpipelines and superscalars are equally capable of exploiting fine grained concurrency, multiprocessors are better at exploiting coarse grained parallelism. An analytical model that is more flexible and less costly in terms of run time than simulation, is proposed as a tool for analyzing the tradeoff between superpipelined processors, superscalar processors, and multiprocessors. The duality of superpipelines and superscalars is examined in detail. The performance limit for these systems has been derived and it supports the fetch bottleneck observation of previous researchers. Common characteristics of utilization curves for such systems are examined. Combined systems, such as superpipelined multiprocessors and superscalar multiprocessors, are also analyzed. The model shows that the number of pipelines (or processors) at which the maximum throughput is obtained is, as memory access time increases, increasingly sensitive to the ratio of memory access time to network access delay. Further, as a function of interiteration dependence distance, optimum throughput is shown to vary nonlinearly, whereas the corresponding optimum number of processors varies linearly. The predictions from the analytical model agree with similar results published using simulation based techniques