An efficient algorithm for exploiting multiple arithmetic units

Authors:
R. M. Tomasulo
Affiliations:
Systems Development Division, Poughkeepsie, New York
Venue:
IBM Journal of Research and Development
Year:
1967

Citing 0
Cited 66

An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors

IEEE Transactions on Computers
Checkpoint repair for high-performance out-of-order execution machines

IEEE Transactions on Computers
OHMEGA: a VLSI superscalar processor architecture for numerical applications

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Comparing static and dynamic code scheduling for multiple-instruction-issue processors

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Data threaded microarchitecture

ACM SIGARCH Computer Architecture News
Improving the Precise Interrupt Mechanism of Software-Managed TLB Miss Handlers

HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
Algebraic Models of Superscalar Microprocessor Implementations: A Case Study

Proceedings of the ESPRIT Working Group 8533 on Prospects for Hardware Foundations: NADA - New Hardware Design Methods, Survey Chapters
Typing Assembly Programs with Explicit Forwarding

TACS '01 Proceedings of the 4th International Symposium on Theoretical Aspects of Computer Software
An EPIC Processor with Pending Functional Units

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Processor Architectures for Multimedia Applications

Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS
A Java-Enabled DSP

Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS
A Comparison of Two Verification Methods for Speculative Instruction Execution

TACAS '00 Proceedings of the 6th International Conference on Tools and Algorithms for Construction and Analysis of Systems: Held as Part of the European Joint Conferences on the Theory and Practice of Software, ETAPS 2000
Realizing High IPC Using Time-Tagged Resource-Flow Computing

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Verification of Infinite State Systems by Compositional Model Checking

CHARME '99 Proceedings of the 10th IFIP WG 10.5 Advanced Research Working Conference on Correct Hardware Design and Verification Methods
Microprocessors - 10 Years Back, 10 Years Ahead

Informatics - 10 Years Back. 10 Years Ahead.
Microarchitecture Verification by Compositional Model Checking

CAV '01 Proceedings of the 13th International Conference on Computer Aided Verification
Formal Verification of Complex Out-of-Order Pipelines by Combining Model-Checking and Theorem-Proving

CAV '02 Proceedings of the 14th International Conference on Computer Aided Verification
Efficient Interprocedural Data Placement Optimisation in a Parallel Library

LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Fred: An Architecture for a Self-Timed Decoupled Computer

ASYNC '96 Proceedings of the 2nd International Symposium on Advanced Research in Asynchronous Circuits and Systems
Performance enhancement of SISD processors

ISCA '79 Proceedings of the 6th annual symposium on Computer architecture
Correctness and equivalence of straight line microprograms

MICRO 6 Conference record of the 6th annual workshop on Microprogramming
Performance Study of a Multithreaded Superscalar Microprocessor

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Cheap Out-of-Order Execution Using Delayed Issue

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Reducing the Energy of Speculative Instruction Schedulers

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Principles of Timing Anomalies in Superscalar Processors

QSIC '05 Proceedings of the Fifth International Conference on Quality Software
Compacting register file via 2-level renaming and bit-partitioning

Microprocessors & Microsystems
Applying a constructivist and collaborative methodological approach in engineering education

Computers & Education
A Restructurable Computer System

IEEE Transactions on Computers
Overlapped Operation with Microprogramming

IEEE Transactions on Computers
A Multiple-Stream Registerless Shared-Resource Processor

IEEE Transactions on Computers
The Burroughs Scientific Processor (BSP)

IEEE Transactions on Computers
On the Effective Bandwidth of Parallel Memories

IEEE Transactions on Computers
The Memory System of a High-Performance Personal Computer

IEEE Transactions on Computers
An Optimal Algorithm for Scheduling Requests on Interleaved Memories for a Pipelined Processor

IEEE Transactions on Computers
The Degradation in Memory Utilization Due to Dependencies

IEEE Transactions on Computers
Instruction Issue Logic in Pipelined Supercomputers

IEEE Transactions on Computers
A Research-Oriented Dynamic Microprocessor

IEEE Transactions on Computers
An Investigation on Testing of Parallelized Code with OpenMP

IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation

Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
Formal Verification of Gate-Level Computer Systems

CSR '09 Proceedings of the Fourth International Computer Science Symposium in Russia on Computer Science - Theory and Applications
A Comparison of Some Theoretical Models of Parallel Computation

IEEE Transactions on Computers
Application-aware prioritization mechanisms for on-chip networks

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Dynamic Malicious Code Detection Based on Binary Translator

CloudCom '09 Proceedings of the 1st International Conference on Cloud Computing
Decoupled state-execute architecture

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
LPA: a first approach to the loop processor architecture

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Aérgia: exploiting packet latency slack in on-chip networks

Proceedings of the 37th annual international symposium on Computer architecture
OoOJava: an out-of-order approach to parallel programming

HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Using speculative functional units in high level synthesis

Proceedings of the Conference on Design, Automation and Test in Europe
An Instruction Fetch Unit for a High-Performance Personal Computer

IEEE Transactions on Computers
OoOJava: software out-of-order execution

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
A pattern language for parallelizing irregular algorithms

Proceedings of the 2010 Workshop on Parallel Programming Patterns
Paxos replicated state machines as the basis of a high-performance data store

Proceedings of the 8th USENIX conference on Networked systems design and implementation
Trebuchet: exploring TLP with dataflow virtualisation

International Journal of High Performance Systems Architecture
Static speculation as post-link optimization for the Grid Alu processor

Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
Design and analysis of adaptive processor

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
DOJ: dynamically parallelizing object-oriented programs

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
A scalable, multi-thread, multi-issue array processor architecture for DSP applications based on extended tomasulo scheme

SAMOS'06 Proceedings of the 6th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Multi core design for chip level multiprocessing

Advanced Lectures on Software Engineering
A lazy, self-optimising parallel matrix library

FP'95 Proceedings of the 1995 international conference on Functional Programming
A case for exploiting subarray-level parallelism (SALP) in DRAM

Proceedings of the 39th Annual International Symposium on Computer Architecture
Virtual register renaming

ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
MP-Tomasulo: A Dependency-Aware Automatic Parallel Execution Engine for Sequential Programs

ACM Transactions on Architecture and Code Optimization (TACO)
Tuning the continual flow pipeline architecture

Proceedings of the 27th international ACM conference on International conference on supercomputing
Virtual register renaming: energy efficient substrate for continual flow pipelines

Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI
Tuning the continual flow pipeline architecture with virtual register renaming

ACM Transactions on Architecture and Code Optimization (TACO)
Scheduling directives: Accelerating shared-memory many-core processor execution

Parallel Computing

Quantified Score

Hi-index	0.04

Visualization

Abstract

This paper describes the methods employed in the floating-point area of the System/360 Model 91 to exploit the existence of multiple execution units. Basic to these techniques is a simple common data busing and register tagging scheme which permits simultaneous execution of independent instructions while preserving the essential precedences inherent in the instruction stream. The common data bus improves performance by efficiently utilizing the execution units without requiring specially optimized code. Instead, the hardware, by 'looking ahead' about eight instructions. automatically optimizes the program execution on a local basis. The application of these techniques is not limited to floating-point arithmetic or System/360 architecture. It may be used in almost any computer having multiple execution units and one or more 'accumulators.' Both of the execution units, as well as the associated storage buffers, multiple accumulators and input /output buses, are extensively checked.