An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors
IEEE Transactions on Computers
ACM SIGARCH Computer Architecture News
The IBM System/370 Vector Architecture: Design Considerations
IEEE Transactions on Computers
A General Proof for Overlapped Multiple-Bit Scanning Multiplications
IEEE Transactions on Computers
Available instruction-level parallelism for superscalar and superpipelined machines
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
IEEE Transactions on Computers
IBM RISC System/6000 processor architecture
IBM Journal of Research and Development
Design of the IBM RISC System/6000 floating-point execution unit
IBM Journal of Research and Development
Instruction scheduling for the IBM RISC System/6000 processor
IBM Journal of Research and Development
Hard-Wired Multipliers with Encoded Partial Products
IEEE Transactions on Computers
Instruction-level parallelism from execution interlock collapsing
ACM SIGARCH Computer Architecture News
On the attributes of the SCISM organization
ACM SIGARCH Computer Architecture News
Interlock collapsing ALU for increased instruction-level parallelism
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The Architecture of Symbolic Computers
The Architecture of Symbolic Computers
Introduction to Arithmetic for Digital Systems Designers
Introduction to Arithmetic for Digital Systems Designers
High-speed addition in CMOS
Instruction-level parallelism from execution interlock collapsing
ACM SIGARCH Computer Architecture News
On the attributes of the SCISM organization
ACM SIGARCH Computer Architecture News
Interlock collapsing ALU for increased instruction-level parallelism
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
SCISM: a scalable compound instruction set machine
IBM Journal of Research and Development
The performance potential of data dependence speculation & collapsing
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Simulation/evaluation environment for a VLIW processor architecture
IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Automatic detection of recurring operation patterns
CODES '99 Proceedings of the seventh international workshop on Hardware/software codesign
Designing domain-specific processors
Proceedings of the ninth international symposium on Hardware/software codesign
Multimedia Execution Hardware Accelerator
Journal of VLSI Signal Processing Systems - Parallel VLSI architectures for image and video processing
High-Performance 3-1 Interlock Collapsing ALU's
IEEE Transactions on Computers
Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS
Embedded processor design challenges
Using Dynamic Binary Translation to Fuse Dependent Instructions
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Motion estimation performance of the TM3270 processor
Proceedings of the 2005 ACM symposium on Applied computing
RENO: A Rename-Based Instruction Optimizer
Proceedings of the 32nd annual international symposium on Computer Architecture
The TM3270 Media-Processor Data Cache
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Profiling of lossless-compression algorithms for a novel biomedical-implant architecture
CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
Proof of correctness of high-performance 3-1 interlock collapsing ALUs
IBM Journal of Research and Development
A just-in-time customizable processor
Proceedings of the International Conference on Computer-Aided Design
Hi-index | 14.98 |
A device capable of executing interlocked fixed point arithmetic logic unit (ALU) instructions in parallel with other instructions causing the execution interlock is presented. The device incorporates the design of a 3-1 ALU and can execute two's complement, unsigned binary, and binary logical operations. It is shown that status for ALU operations using a 3-1 ALU can be determined in a parallel fashion, resulting in the compliance of the proposed device with predetermined architectural behavior of single instruction execution. The device requires no more logic stages than does a 3-1 binary adder using a carry-save adder (CSA) followed by a carry-lookahead adder (CLA) design. Design considerations using a commonly available CMOS technology are also reported, indicating that the device will not increase the machine cycle of an implementation. It is suggested that the device can maintain full architectural compatibility.