An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors
IEEE Transactions on Computers
A VLIW architecture for a trace scheduling compiler
ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Available instruction-level parallelism for superscalar and superpipelined machines
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
A study of branch prediction strategies
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
The floating point performance of a superscalar SPARC processor
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
OHMEGA: a VLSI superscalar processor architecture for numerical applications
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
On the attributes of the SCISM organization
ACM SIGARCH Computer Architecture News
SCISM: a scalable compound instruction set machine
IBM Journal of Research and Development
The 16-fold way: a microparallel taxonomy
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Optimizing a Superscalar Machine to Run Vector Code
IEEE Parallel & Distributed Technology: Systems & Technology
Efficient Instruction Sequencing with Inline Target Insertion
IEEE Transactions on Computers
SWIFT: Software Implemented Fault Tolerance
Proceedings of the international symposium on Code generation and optimization
Design and Evaluation of Hybrid Fault-Detection Systems
Proceedings of the 32nd annual international symposium on Computer Architecture
Software-controlled fault tolerance
ACM Transactions on Architecture and Code Optimization (TACO)
Static typing for a faulty lambda calculus
Proceedings of the eleventh ACM SIGPLAN international conference on Functional programming
Fault-tolerant typed assembly language
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
A load-instruction unit for pipelined processors
IBM Journal of Research and Development
Analysis of single-event effects in embedded processors for non-uniform fault tolerant design
IIT'09 Proceedings of the 6th international conference on Innovations in information technology
DAFT: decoupled acyclic fault tolerance
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Runtime asynchronous fault tolerance via speculation
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Hi-index | 0.01 |
This paper describes the architecture for issuing multiple instructions per clock in the NonStop Cyclone Processor. Pairs of instructions are fetched and decoded by a dual two-stage prefetch pipeline and passed to a dual six-stage pipeline for execution. Dynamic branch prediction is used to reduce branch penalties. A unique microcode routine for each pair is stored in the large duplexed control store. The microcode controls parallel data paths optimized for executing the most frequent instruction pairs. Other features of the architecture include cache support for unaligned double-precision accesses, a virtually-addressed main memory, and a novel precise exception mechanism.