ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
A VLIW architecture for a trace Scheduling Compiler
IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
IEEE Spectrum
Architecture and compiler tradeoffs for a long instruction wordprocessor
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Limits on multiple instruction issue
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Implementation of precise interrupts in pipelined processors
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
ACM Computing Surveys (CSUR)
Super-Scalar Processor Design
Planning a computer system: Project Stretch
Planning a computer system: Project Stretch
High-bandwidth data memory systems for superscalar processors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Global instruction scheduling for superscalar machines
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
OHMEGA: a VLSI superscalar processor architecture for numerical applications
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
IMPACT: an architectural framework for multiple-instruction-issue processors
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Exploiting fine-grained parallelism through a combination of hardware and software techniques
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
A parallel pipelined processor with conditional instruction execution
ACM SIGARCH Computer Architecture News - Symposium on parallel algorithms and architectures
The effect of employing advanced branching mechanisms in superscalar processors
ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News
Comparing static and dynamic code scheduling for multiple-instruction-issue processors
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Limits of control flow on parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
An elementary processor architecture with simultaneous instruction issuing from multiple threads
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Software support for speculative loads
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Sentinel scheduling for VLIW and superscalar processors
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Efficient superscalar performance through boosting
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Exploiting instruction-level parallelism with the conjugate register file scheme
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
An efficient resource-constrained global scheduling technique for superscalar and VLIW processors
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Exploiting instruction-level parallelism: the multithreaded approach
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Performance evaluation of instruction scheduling on the IBM RISC System/6000
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Performance analysis and design methodology for a scalable superscalar architecture
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Sentinel scheduling: a model for compiler-controlled speculative execution
ACM Transactions on Computer Systems (TOCS)
Speculative execution and branch prediction on parallel machines
ICS '93 Proceedings of the 7th international conference on Supercomputing
Enhanced superscalar hardware: the schedule table
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
The anatomy of the register file in a multiscalar processor
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Unconstrained speculative execution with predicated state buffering
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A macrotask-level unlimited speculative execution on multiprocessors
ICS '95 Proceedings of the 9th international conference on Supercomputing
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Performance comparison of ILP machines with cycle time evaluation
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The 16-fold way: a microparallel taxonomy
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Control flow prediction for dynamic ILP processors
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Speculative execution exception recovery using write-back suppression
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Evaluation of scheduling techniques on a SPARC-based VLIW testbed
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A decision procedure for bit-vector arithmetic
DAC '98 Proceedings of the 35th annual Design Automation Conference
Integrated predicated and speculative execution in the IMPACT EPIC architecture
Proceedings of the 25th annual international symposium on Computer architecture
IMPACT: an architectural framework for multiple-instruction-issue processors
25 years of the international symposia on Computer architecture (selected papers)
Partial method compilation using dynamic profile information
OOPSLA '01 Proceedings of the 16th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
On the Boosting of Instruction Scheduling by Renaming
The Journal of Supercomputing
The Importance of Prepass Code Scheduling for Superscalar and Superpipelined Processors
IEEE Transactions on Computers
Three Architectural Models for Compiler-Controlled Speculative Execution
IEEE Transactions on Computers
IEEE Transactions on Computers
Optimizing Java Programs in the Presence of Exceptions
ECOOP '00 Proceedings of the 14th European Conference on Object-Oriented Programming
Eliminating Exception Constraints of Java Programs for IA-64
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Improved Implementations of the Speculative Memory Access Mechanism specMEM
IWIA '99 Proceedings of the 1999 International Workshop on Innovative Architecture
Compiler orchestrated prefetching via speculation and predication
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Application driven embedded system design: a face recognition case study
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Hi-index | 0.01 |
This paper describes a superscalar processor that combines the best qualities of static and dynamic instruction scheduling to increase the performance of non-numerical applications. The architecture performs all instruction scheduling statically to take advantage of the compiler's ability to efficiently schedule operations across many basic blocks. Since the conditional branches in non-numerical code are highly data dependent, the architecture introduces the concept of boosted instructions, instructions that are committed conditionally upon the result of later branch instructions. Boosting effectively removes the dependencies caused by branches and makes the scheduling of side-effect instructions as simple as those that are side-effect free. For efficiency, boosting is supported in the hardware by shadow structures that temporarily hold the side effects of boosted instructions until the conditional branches that the boosted instructions depend upon are executed. When the branch condition is determined, the buffered side effects are either committed or squashed. The limited static scheduler in our evaluation system shows that a 1.6-times speedup over scalar code is achievable by boosting instructions above only a single conditional branch. This performance is similar to the performance of a pure dynamic scheduler.