An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors
IEEE Transactions on Computers
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
HPS, a new microarchitecture: rationale and introduction
MICRO 18 Proceedings of the 18th annual workshop on Microprogramming
Instruction issue logic for high-performance, interruptable pipelined processors
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
A VLIW architecture for a trace scheduling compiler
ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Checkpoint repair for high-performance out-of-order execution machines
IEEE Transactions on Computers
The performance potential of multiple functional unit processors
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Implementation of precise interrupts in pipelined processors
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Very Long Instruction Word architectures and the ELI-512
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
A user-microprogrammable, local host computer with low-level parallelism
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Instruction issue logic for pipelined supercomputers
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
High-bandwidth data memory systems for superscalar processors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Architecture and implementation of a VLIW supercomputer
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
A Theory of Reduced and Minimal Procedural Dependencies
IEEE Transactions on Computers
OHMEGA: a VLSI superscalar processor architecture for numerical applications
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
ACM SIGARCH Computer Architecture News
Limits of control flow on parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The expandable split window paradigm for exploiting fine-grain parallelsim
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
An elementary processor architecture with simultaneous instruction issuing from multiple threads
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Performance analysis and design methodology for a scalable superscalar architecture
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Enhanced superscalar hardware: the schedule table
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Unconstrained speculative execution with predicated state buffering
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Performance comparison of ILP machines with cycle time evaluation
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The 16-fold way: a microparallel taxonomy
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Control independence in trace processors
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Boosting beyond static scheduling in a superscalar processor
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
IEEE Transactions on Computers
Configuring a real time radio signal processor on an embedded system using compiled XML
SIP '07 Proceedings of the Ninth IASTED International Conference on Signal and Image Processing
Invasive computing in HPC with X10
Proceedings of the third ACM SIGPLAN X10 Workshop
Hi-index | 0.01 |
SIMP is a novel multiple instruction-pipeline parallel architecture. It is targeted for enhancing the performance of SISD processors drastically by exploiting both temporal and spatial parallelisms, and for keeping program compatibility as well. Degree of performance enhancement achieved by SIMP depends on; i) how to supply multiple instructions continuously, and ii) how to resolve data and control dependencies effectively. We have devised the outstanding techniques for instruction fetch and dependency resolution. The instruction fetch mechanism employs unique schemes of; i) prefetching multiple instructions with the help of branch prediction, ii) squashing instructions selectively, and iii) providing multiple conditional modes as a result. The dependency resolution mechanism permits out-of-order execution of sequential instruction stream. Our out-of-order execution model is based on Tomasulo's algorithm which has been used in single instruction-pipeline processors. However, it is greatly extended and accommodated to multiple instruction pipelining with; i) detecting and identifying multiple dependencies simultaneously, ii) alleviating the effects of control dependencies with both eager execution and advance execution, and iii) ensuring a precise machine state against branches and interrupts. By taking advantage of these techniques, SIMP is one of the most promising architectures toward the coming generation of high-speed single processors.