An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors
IEEE Transactions on Computers
Checkpoint repair for high-performance out-of-order execution machines
IEEE Transactions on Computers
OHMEGA: a VLSI superscalar processor architecture for numerical applications
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Comparing static and dynamic code scheduling for multiple-instruction-issue processors
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Data threaded microarchitecture
ACM SIGARCH Computer Architecture News
Improving the Precise Interrupt Mechanism of Software-Managed TLB Miss Handlers
HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
Algebraic Models of Superscalar Microprocessor Implementations: A Case Study
Proceedings of the ESPRIT Working Group 8533 on Prospects for Hardware Foundations: NADA - New Hardware Design Methods, Survey Chapters
Typing Assembly Programs with Explicit Forwarding
TACS '01 Proceedings of the 4th International Symposium on Theoretical Aspects of Computer Software
An EPIC Processor with Pending Functional Units
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Processor Architectures for Multimedia Applications
Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS
Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS
A Comparison of Two Verification Methods for Speculative Instruction Execution
TACAS '00 Proceedings of the 6th International Conference on Tools and Algorithms for Construction and Analysis of Systems: Held as Part of the European Joint Conferences on the Theory and Practice of Software, ETAPS 2000
Realizing High IPC Using Time-Tagged Resource-Flow Computing
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Verification of Infinite State Systems by Compositional Model Checking
CHARME '99 Proceedings of the 10th IFIP WG 10.5 Advanced Research Working Conference on Correct Hardware Design and Verification Methods
Microprocessors - 10 Years Back, 10 Years Ahead
Informatics - 10 Years Back. 10 Years Ahead.
Microarchitecture Verification by Compositional Model Checking
CAV '01 Proceedings of the 13th International Conference on Computer Aided Verification
CAV '02 Proceedings of the 14th International Conference on Computer Aided Verification
Efficient Interprocedural Data Placement Optimisation in a Parallel Library
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Fred: An Architecture for a Self-Timed Decoupled Computer
ASYNC '96 Proceedings of the 2nd International Symposium on Advanced Research in Asynchronous Circuits and Systems
Performance enhancement of SISD processors
ISCA '79 Proceedings of the 6th annual symposium on Computer architecture
Correctness and equivalence of straight line microprograms
MICRO 6 Conference record of the 6th annual workshop on Microprogramming
Performance Study of a Multithreaded Superscalar Microprocessor
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Cheap Out-of-Order Execution Using Delayed Issue
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Reducing the Energy of Speculative Instruction Schedulers
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Principles of Timing Anomalies in Superscalar Processors
QSIC '05 Proceedings of the Fifth International Conference on Quality Software
Compacting register file via 2-level renaming and bit-partitioning
Microprocessors & Microsystems
A Restructurable Computer System
IEEE Transactions on Computers
Overlapped Operation with Microprogramming
IEEE Transactions on Computers
A Multiple-Stream Registerless Shared-Resource Processor
IEEE Transactions on Computers
The Burroughs Scientific Processor (BSP)
IEEE Transactions on Computers
On the Effective Bandwidth of Parallel Memories
IEEE Transactions on Computers
The Memory System of a High-Performance Personal Computer
IEEE Transactions on Computers
An Optimal Algorithm for Scheduling Requests on Interleaved Memories for a Pipelined Processor
IEEE Transactions on Computers
The Degradation in Memory Utilization Due to Dependencies
IEEE Transactions on Computers
Instruction Issue Logic in Pipelined Supercomputers
IEEE Transactions on Computers
A Research-Oriented Dynamic Microprocessor
IEEE Transactions on Computers
An Investigation on Testing of Parallelized Code with OpenMP
IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
Formal Verification of Gate-Level Computer Systems
CSR '09 Proceedings of the Fourth International Computer Science Symposium in Russia on Computer Science - Theory and Applications
A Comparison of Some Theoretical Models of Parallel Computation
IEEE Transactions on Computers
Application-aware prioritization mechanisms for on-chip networks
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Dynamic Malicious Code Detection Based on Binary Translator
CloudCom '09 Proceedings of the 1st International Conference on Cloud Computing
Decoupled state-execute architecture
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
LPA: a first approach to the loop processor architecture
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Aérgia: exploiting packet latency slack in on-chip networks
Proceedings of the 37th annual international symposium on Computer architecture
OoOJava: an out-of-order approach to parallel programming
HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Using speculative functional units in high level synthesis
Proceedings of the Conference on Design, Automation and Test in Europe
An Instruction Fetch Unit for a High-Performance Personal Computer
IEEE Transactions on Computers
OoOJava: software out-of-order execution
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
A pattern language for parallelizing irregular algorithms
Proceedings of the 2010 Workshop on Parallel Programming Patterns
Paxos replicated state machines as the basis of a high-performance data store
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Trebuchet: exploring TLP with dataflow virtualisation
International Journal of High Performance Systems Architecture
Static speculation as post-link optimization for the Grid Alu processor
Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
Design and analysis of adaptive processor
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
DOJ: dynamically parallelizing object-oriented programs
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
SAMOS'06 Proceedings of the 6th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Multi core design for chip level multiprocessing
Advanced Lectures on Software Engineering
A lazy, self-optimising parallel matrix library
FP'95 Proceedings of the 1995 international conference on Functional Programming
A case for exploiting subarray-level parallelism (SALP) in DRAM
Proceedings of the 39th Annual International Symposium on Computer Architecture
ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
MP-Tomasulo: A Dependency-Aware Automatic Parallel Execution Engine for Sequential Programs
ACM Transactions on Architecture and Code Optimization (TACO)
Tuning the continual flow pipeline architecture
Proceedings of the 27th international ACM conference on International conference on supercomputing
Virtual register renaming: energy efficient substrate for continual flow pipelines
Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI
Tuning the continual flow pipeline architecture with virtual register renaming
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.04 |
This paper describes the methods employed in the floating-point area of the System/360 Model 91 to exploit the existence of multiple execution units. Basic to these techniques is a simple common data busing and register tagging scheme which permits simultaneous execution of independent instructions while preserving the essential precedences inherent in the instruction stream. The common data bus improves performance by efficiently utilizing the execution units without requiring specially optimized code. Instead, the hardware, by 'looking ahead' about eight instructions. automatically optimizes the program execution on a local basis. The application of these techniques is not limited to floating-point arithmetic or System/360 architecture. It may be used in almost any computer having multiple execution units and one or more 'accumulators.' Both of the execution units, as well as the associated storage buffers, multiple accumulators and input /output buses, are extensively checked.