An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors
IEEE Transactions on Computers
HPS, a new microarchitecture: rationale and introduction
MICRO 18 Proceedings of the 18th annual workshop on Microprogramming
Branch folding in the CRISP microprocessor: reducing branch delay to zero
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
The performance potential of multiple functional unit processors
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
i860 microprocessor internal architecture
Microprocessors & Microsystems
Machine organization of the IBM RISC System/6000 processor
IBM Journal of Research and Development
The Evolution of Instruction Sequencing
Computer - Special issue on instruction sequencing
Single instruction stream parallelism is greater than two
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
ACM Computing Surveys (CSUR)
Algorithm 428: Hu-Tucker minimum redundancy alphabetic coding method [Z]
Communications of the ACM
The Art of Computer Programming Volumes 1-3 Boxed Set
The Art of Computer Programming Volumes 1-3 Boxed Set
Elementary Numerical Analysis: An Algorithmic Approach
Elementary Numerical Analysis: An Algorithmic Approach
Decoupled access/execute computer architectures
ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
A technique of global optimization of microprograms
MICRO 11 Proceedings of the 11th annual workshop on Microprogramming
Instruction issue logic for pipelined supercomputers
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Percolation Scheduling: A Parallel Compilation Technique
Percolation Scheduling: A Parallel Compilation Technique
Hardware extraction of low-level concurrency from sequential instruction streams (parallelism, implementation, architecture, dependencies, semantics)
Hi-index | 0.00 |
The inherent low level parallelism of Super-Scalar architectures plays an important role in the processing power provided by these machines: independent functional units promote opportunities for executing several machine operations simultaneously. From the viewpoint of the hardware designer it is very important to assess the influence of each functional unit, and the way they communicate, on the overall performance of the machine. Particularly, it is highly desirable to determine an upper bound in the number of additional functional units which give significant performance improvement ratios.This work describes experiments that have been carried out to assess the effect of alternative instruction issue mechanisms, multiple functional units, instruction queues, common data bus and other hardware solutions on the performance of Super-Scalar machines. The assessment was obtained by interpreting non optimized object code fo an actual processor on some basic machine models. The paper outline the main aspects of the research, and shows that speed-up ratios of up to 3.35 times were observed during the interpretation of benchmark programs.