An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors
IEEE Transactions on Computers
ACM SIGARCH Computer Architecture News
Available instruction-level parallelism for superscalar and superpipelined machines
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
IEEE Transactions on Computers
IBM RISC System/6000 processor architecture
IBM Journal of Research and Development
Design of the IBM RISC System/6000 floating-point execution unit
IBM Journal of Research and Development
Instruction-level parallelism from execution interlock collapsing
ACM SIGARCH Computer Architecture News
The Architecture of Symbolic Computers
The Architecture of Symbolic Computers
Introduction to Arithmetic for Digital Systems Designers
Introduction to Arithmetic for Digital Systems Designers
IEEE Transactions on Computers
Instruction-level parallelism from execution interlock collapsing
ACM SIGARCH Computer Architecture News
On the attributes of the SCISM organization
ACM SIGARCH Computer Architecture News
Interlock collapsing ALU for increased instruction-level parallelism
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
SCISM: a scalable compound instruction set machine
IBM Journal of Research and Development
The performance potential of data dependence speculation & collapsing
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit
Proceedings of the 27th annual international symposium on Computer architecture
Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS
Embedded processor design challenges
Characterizing and predicting value degree of use
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 31st annual international symposium on Computer architecture
Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Dataflow Mini-Graphs: Amplifying Superscalar Capacity and Bandwidth
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors
Proceedings of the 32nd annual international symposium on Computer Architecture
Exploring the design space of LUT-based transparent accelerators
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Scalable subgraph mapping for acyclic computation accelerators
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Exploiting Narrow Accelerators with Data-Centric Subgraph Mapping
Proceedings of the International Symposium on Code Generation and Optimization
Proof of correctness of high-performance 3-1 interlock collapsing ALUs
IBM Journal of Research and Development
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Comparing FPGA vs. custom cmos and the impact on processor microarchitecture
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
SoftHV: a HW/SW co-designed processor with horizontal and vertical fusion
Proceedings of the 8th ACM International Conference on Computing Frontiers
Hi-index | 14.98 |
A high-performance 3-1 interlock collapsing ALU, i.e., an ALU that allows the execution of most execution interlocks in a single machine cycle, is presented. We focus on reducing the Boolean equations describing the device and the incorporation of new mechanisms in the interlock collapsing ALU design. In particular, we focus on the reduction of the critical path, regarding delay, for the interlock collapsing ALU implementation. It is shown that the delay associated with the implementation of the proposed device, in terms of logic stages, assuming a commonly available CMOS technology, is equivalent to the number of logic stages required for the implementation of a 3-1 binary adder. The resulting implementation demonstrates that the proposed 3-1 interlock collapsing ALU can be designed to outperform existing schemes for interlock collapsing ALU's by a factor of at least two. Finally, it is suggested that the proposed device can be used in the implementation of multiple instruction issuing machines, allowing the issuance and execution of interlocks in parallel and in a single machine cycle with no cycle time increases.