Exploiting parallel microprocessor microarchitectures with a compiler code generator

Authors:
W. W. Hwu;P. P. Chang
Affiliations:
Univ. of Illinois, Urbana;Univ. of Illinois, Urbana
Venue:
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Year:
1988

Citing 10
Cited 11

Bulldog: a compiler for VLSI architectures

Bulldog: a compiler for VLSI architectures
Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
Performance evaluation of multiple register sets

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
HPSm2: A refund single-chip microengine

Proceedings of the Twenty-First Annual Hawaii International Conference on Architecture Track
Exploiting horizontal and vertical concurrency via the HPSm microprocessor

MICRO 20 Proceedings of the 20th annual workshop on Microprogramming
Postpass Code Optimization of Pipeline Constraints

ACM Transactions on Programming Languages and Systems (TOPLAS)
Dependence graphs and compiler optimizations

POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A Workbench for Computer Architects

IEEE Design & Test
Hpsm: exploiting concurrency to achieve high performance in a single-chip microarchitecture

Hpsm: exploiting concurrency to achieve high performance in a single-chip microarchitecture
Measuring the Parallelism Available for Very Long Instruction Word Architectures

IEEE Transactions on Computers

Trace selection for compiling large C application programs to microcode

MICRO 21 Proceedings of the 21st annual workshop on Microprogramming and microarchitecture
Experimental analysis of communication/data-conditional aspects of a mixed-mode parallel architecture via synthetic computations

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
The Marion system for retargetable instruction scheduling

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
IMPACT: an architectural framework for multiple-instruction-issue processors

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
The effect on RISC performance of register set size and structure versus code generation strategy

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
A parallel pipelined processor with conditional instruction execution

ACM SIGARCH Computer Architecture News - Symposium on parallel algorithms and architectures
Code scheduling for VLIW/superscalar processors with limited register files

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
IMPACT: an architectural framework for multiple-instruction-issue processors

25 years of the international symposia on Computer architecture (selected papers)
Code generation of nested loops for DSP processors with heterogeneous registers and structural pipelining

ACM Transactions on Design Automation of Electronic Systems (TODAES)
A brief survey of papers on scheduling for pipelined processors

ACM SIGPLAN Notices
The Importance of Prepass Code Scheduling for Superscalar and Superpipelined Processors

IEEE Transactions on Computers

Quantified Score

Hi-index	0.00

Visualization

Abstract

With advances in VLSI technology, microprocessor designers can provide more microarchitectural parallelism to increase performance. We have identified four major forms of such parallelism: multiple microoperations issued per cycle, multiple result distribution buses, multiple execution units, and pipelined execution units. The experiments reported in this paper address two important issues: The effects of these forms and the appropriate balance among them. A central microarchitecture is identified as the comparison basis. We separately vary each form of the microarchitectural parallelism in the central to measure their individual effects on performance. In addition, we vary two forms of the microarchitectural parallelism in the central to derive an appropriate balance between them. To make fair comparisons, our compiler generates different code sequences optimized for different microarchitectural configurations. For each given set of technology constraints, these experiments can be used to derive a cost-effective microarchitecture to execute each given set of workload programs at high speed.