Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Global register allocation at link time
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors
IEEE Transactions on Computers
Automatic translation of FORTRAN programs to vector form
ACM Transactions on Programming Languages and Systems (TOPLAS)
The Mahler experience: using an intermediate language as the machine description
ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Reduced Instruction Set Computer Architectures for VLSI
Reduced Instruction Set Computer Architectures for VLSI
Bulldog: a compiler for vliw architectures (parallel computing, reduced-instruction-set, trace scheduling, scientific)
Tradeoffs in instruction format design for horizontal architectures
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Cost-effective design of application specific VLIW processors using the SCARCE framework
MICRO 22 Proceedings of the 22nd annual workshop on Microprogramming and microarchitecture
IEEE Transactions on Computers
Efficient trace-driven simulation method for cache performance analysis
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Reducing the branch penalty by rearranging instructions in a double-width memory
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The floating point performance of a superscalar SPARC processor
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
High-bandwidth data memory systems for superscalar processors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Limits of instruction-level parallelism
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Performance from architecture: comparing a RISC and a CISC with similar hardware organization
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Architecture and implementation of a VLIW supercomputer
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Global instruction scheduling for superscalar machines
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
An empirical study of the CRAY Y-MP processor using the Perfect club benchmarks
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
IMPACT: an architectural framework for multiple-instruction-issue processors
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Single instruction stream parallelism is greater than two
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Exploiting fine-grained parallelism through a combination of hardware and software techniques
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Strategies for achieving improved processor throughput
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
ACM SIGARCH Computer Architecture News
How many operation units are adequate?
ACM SIGARCH Computer Architecture News
Comparing static and dynamic code scheduling for multiple-instruction-issue processors
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Two-level adaptive training branch prediction
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Code duplication: an assist for global instruction scheduling
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Computer Architecture in the 1990s
Computer
MOVE: a framework for high-performance processor design
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Efficient trace-driven simulation methods for cache performance analysis
ACM Transactions on Computer Systems (TOCS)
The expandable split window paradigm for exploiting fine-grain parallelsim
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Alternative implementations of two-level adaptive branch prediction
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Processor coupling: integrating compile time and runtime scheduling for parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Evaluation of the WM architecture
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Efficient superscalar performance through boosting
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Improving instruction supply efficiency in superscalar architectures using instruction trace buffers
SAC '92 Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing: technological challenges of the 1990's
On the limits of program parallelism and its smoothability
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
On the instruction-level characteristics of scalar code in highly-vectorized scientific applications
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
A comprehensive instruction fetch mechanism for a processor supporting speculative execution
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Enhanced superscalar hardware: the schedule table
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
SCISM: a scalable compound instruction set machine
IBM Journal of Research and Development
Programming, compilation, and resource management issues for multithreading (panel session II)
ACM SIGARCH Computer Architecture News - Special issue: panel sessions of the 1991 workshop on multithreaded computers
Designing the TFP Microprocessor
IEEE Micro
Branch with masked squashing in superpipelined processors
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Height reduction of control recurrences for ILP processors
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Evaluating Performance Tradeoffs Between Fine-Grained and Coarse-Grained Alternatives
IEEE Transactions on Parallel and Distributed Systems
The influence of branch prediction table interference on branch prediction scheme performance
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Strategic directions in computer architecture
ACM Computing Surveys (CSUR) - Special ACM 50th-anniversary issue: strategic directions in computing research
The performance potential of data dependence speculation & collapsing
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
The 16-fold way: a microparallel taxonomy
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Techniques for extracting instruction level parallelism on MIMD architectures
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Available paralellism in video applications
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Parallelizing nonnumerical code with selective scheduling and software pipelining
ACM Transactions on Programming Languages and Systems (TOPLAS)
The potential of data value speculation to boost ILP
ICS '98 Proceedings of the 12th international conference on Supercomputing
25 years of the international symposia on Computer architecture (selected papers)
IMPACT: an architectural framework for multiple-instruction-issue processors
25 years of the international symposia on Computer architecture (selected papers)
Alternative implementations of two-level adaptive branch prediction
25 years of the international symposia on Computer architecture (selected papers)
Increasing effective IPC by exploiting distant parallelism
ICS '99 Proceedings of the 13th international conference on Supercomputing
IEEE Transactions on Computers
Multiple instruction issue in the NonStop cyclone processor
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The impact of synchronization and granularity on parallel systems
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Trace-driven simulations for a two-level cache design in open bus systems
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Trident: a scalable architecture for scalar, vector, and matrix operations
CRPIT '02 Proceedings of the seventh Asia-Pacific conference on Computer systems architecture
Architectural differences of efficient sequential and parallel computers
Journal of Systems Architecture: the EUROMICRO Journal
Computation in the Context of Transport Triggered Architectures
International Journal of Parallel Programming
Cache Memories for Dataflow Systems
IEEE Parallel & Distributed Technology: Systems & Technology
Motorola's 88000 Family Architecture
IEEE Micro
IEEE Micro
Efficient Instruction Sequencing with Inline Target Insertion
IEEE Transactions on Computers
IEEE Transactions on Computers
High-Performance 3-1 Interlock Collapsing ALU's
IEEE Transactions on Computers
The Importance of Prepass Code Scheduling for Superscalar and Superpipelined Processors
IEEE Transactions on Computers
Three Architectural Models for Compiler-Controlled Speculative Execution
IEEE Transactions on Computers
A Performance and Cost Analysis of Applying Superscalar Method to Mainframe Computers
IEEE Transactions on Computers
Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling
IEEE Transactions on Parallel and Distributed Systems
ACISP '01 Proceedings of the 6th Australasian Conference on Information Security and Privacy
Random Register Renaming to Foil DPA
CHES '01 Proceedings of the Third International Workshop on Cryptographic Hardware and Embedded Systems
An Efficient Technique of Instruction Scheduling on a Superscalar-Based Mulprocessor
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Micronets: a model for decentralising control in asynchronous processor architectures
ASYNC '95 Proceedings of the 2nd Working Conference on Asynchronous Design Methodologies
ARAS: asynchronous RISC architecture simulator
ASYNC '95 Proceedings of the 2nd Working Conference on Asynchronous Design Methodologies
Program balance and its impact on high performance RISC architectures
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Evaluating Signal Processing and Multimedia Applications on SIMD, VLIW and Superscalar Architectures
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
A parallel computer as a NOC region
Networks on chip
Register allocation for optimal loop scheduling
CASCON '93 Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research: distributed computing - Volume 2
The impact of x86 instruction set architecture on superscalar processing
Journal of Systems Architecture: the EUROMICRO Journal
ILP in the undergraduate curriculum
WCAE '02 Proceedings of the 2002 workshop on Computer architecture education: Held in conjunction with the 29th International Symposium on Computer Architecture
Streamlining long latency instructions for seamlessly combined out-of-order and in-order execution
Microprocessors & Microsystems
Proof of correctness of high-performance 3-1 interlock collapsing ALUs
IBM Journal of Research and Development
A load-instruction unit for pipelined processors
IBM Journal of Research and Development
A multithreaded multicore system for embedded media processing
Transactions on high-performance embedded architectures and compilers III
Data sharing conscious scheduling for multi-threaded applications on SMP machines
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Hi-index | 0.03 |
Superscalar machines can issue several instructions per cycle. Superpipelined machines can issue only one instruction per cycle, but they have cycle times shorter than the latency of any functional unit. In this paper these two techniques are shown to be roughly equivalent ways of exploiting instruction-level parallelism. A parameterizable code reorganization and simulation system was developed and used to measure instruction-level parallelism for a series of benchmarks. Results of these simulations in the presence of various compiler optimizations are presented. The average degree of superpipelining metric is introduced. Our simulations suggest that this metric is already high for many machines. These machines already exploit all of the instruction-level parallelism available in many non-numeric applications, even without parallel instruction issue or higher degrees of pipelining.