IMPACT: an architectural framework for multiple-instruction-issue processors

Authors:
Pohua P. Chang;Scott A. Mahlke;William Y. Chen;Nancy J. Warter;Wen-mei W. Hwu
Affiliations:
Center for Reliable and High-Performance Computing, University of Illinois, Urbana, IL;Center for Reliable and High-Performance Computing, University of Illinois, Urbana, IL;Center for Reliable and High-Performance Computing, University of Illinois, Urbana, IL;Center for Reliable and High-Performance Computing, University of Illinois, Urbana, IL;Center for Reliable and High-Performance Computing, University of Illinois, Urbana, IL
Venue:
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Year:
1991

Citing 28
Cited 126

Bulldog: a compiler for VLSI architectures

Bulldog: a compiler for VLSI architectures
Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors

IEEE Transactions on Computers
HPSm, a high performance restricted data flow architecture having minimal functionality

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Reducing the cost of branches

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
HPS, a new microarchitecture: rationale and introduction

MICRO 18 Proceedings of the 18th annual workshop on Microprogramming
A study of scalar compilation techniques for pipelined supercomputers

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
A VLIW architecture for a trace scheduling compiler

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Exploiting parallel microprocessor microarchitectures with a compiler code generator

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Code scheduling and register allocation in large basic blocks

ICS '88 Proceedings of the 2nd international conference on Supercomputing
The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions, and Trade-Offs

Computer
Trace selection for compiling large C application programs to microcode

MICRO 21 Proceedings of the 21st annual workshop on Microprogramming and microarchitecture
Architecture and compiler tradeoffs for a long instruction wordprocessor

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Tradeoffs in instruction format design for horizontal architectures

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Available instruction-level parallelism for superscalar and superpipelined machines

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Limits on multiple instruction issue

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Inline function expansion for compiling C programs

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Achieving high instruction cache performance with an optimizing compiler

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Forward semantic: a compiler-assisted instruction fetch method for heavily pipelined processors

MICRO 22 Proceedings of the 22nd annual workshop on Microprogramming and microarchitecture
Code compaction for parallel architectures

Software—Practice & Experience
Instruction scheduling beyond basic blocks

IBM Journal of Research and Development
Trace scheduling optimization in a retargetable microcode compiler

MICRO 20 Proceedings of the 20th annual workshop on Microprogramming
Control flow optimization for supercomputer scalar processing

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Boosting beyond static scheduling in a superscalar processor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Register allocation by priority-based coloring

SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing

MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Register allocation & spilling via graph coloring

SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction

Data access microarchitectures for superscalar processors with compiler-assisted data prefetching

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Concurrency Extraction Via Hardware Methods Executing the Static Instruction Stream

IEEE Transactions on Computers
Tolerating data access latency with register preloading

ICS '92 Proceedings of the 6th international conference on Supercomputing
Sentinel scheduling for VLIW and superscalar processors

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Efficient superscalar performance through boosting

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
An efficient architecture for loop based data preloading

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Code scheduling for VLIW/superscalar processors with limited register files

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Performance evaluation of instruction scheduling on the IBM RISC System/6000

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
A new approach to schedule operations across nested-ifs and nested-loops

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Compiler code transformations for superscalar-based high performance systems

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Sentinel scheduling: a model for compiler-controlled speculative execution

ACM Transactions on Computer Systems (TOCS)
Enhanced superscalar hardware: the schedule table

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
VLIW compilation techniques in a superscalar environment

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Guarded execution and branch prediction in dynamic ILP processors

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Data relocation and prefetching for programs with large data sets

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
The effects of predicated execution on branch prediction

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Analysis of the conditional skip instructions of the HP precision architecture

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Unconstrained speculative execution with predicated state buffering

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Using predicated execution to improve the performance of a dynamically scheduled machine with speculative execution

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Dynamic rescheduling: a technique for object code compatibility in VLIW architectures

Proceedings of the 28th annual international symposium on Microarchitecture
A reduced multipipeline machine description that preserves scheduling constraints

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Source-level debugging of scalar optimized code

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Automating parallel runtime optimizations using post-mortem analysis

ICS '96 Proceedings of the 10th international conference on Supercomputing
Accurate and practical profile-driven compilation using the profile buffer

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Speculative hedge: regulating compile-time speculation against profile variations

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Java bytecode to native code translation: the caffeine prototype and preliminary results

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Optimization of machine descriptions for efficient use

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
A study on the number of memory ports in multiple instruction issue machines

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
The 16-fold way: a microparallel taxonomy

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Control flow prediction for dynamic ILP processors

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Branch history table indexing to prevent pipeline bubbles in wide-issue superscalar processors

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Speculative execution exception recovery using write-back suppression

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Superblock formation using static program analysis

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Run-time adaptive cache hierarchy management via reference analysis

Proceedings of the 24th annual international symposium on Computer architecture
Run-time spatial locality detection and optimization

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Evaluation of scheduling techniques on a SPARC-based VLIW testbed

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The multicluster architecture: reducing cycle time through partitioning

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The filter cache: an energy efficient memory structure

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Procedure based program compression

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Simulation/evaluation environment for a VLIW processor architecture

IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Media architecture: general purpose vs. multiple application-specific programmable processor

DAC '98 Proceedings of the 35th annual Design Automation Conference
Compiler-directed early load-address generation

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
An out-of-order execution technique for runtime binary translators

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
MPS: Miss-Path Scheduling for Multiple-Issue Processors

IEEE Transactions on Computers
Load-reuse analysis: design and evaluation

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
A new framework for debugging globally optimized code

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Power efficient mediaprocessors: design space exploration

Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Designing power efficient hypermedia processors

ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
The Multicluster Architecture: Reducing Processor Cycle Time Through Partitioning

International Journal of Parallel Programming
Filtering Memory References to Increase Energy Efficiency

IEEE Transactions on Computers
Probabilistic Loop Scheduling for Applications with Uncertain Execution Time

IEEE Transactions on Computers
Run-Time Cache Bypassing

IEEE Transactions on Computers
Function unit specialization through code analysis

ICCAD '99 Proceedings of the 1999 IEEE/ACM international conference on Computer-aided design
Localized watermarking: methodology and application to operation scheduling

ICCAD '99 Proceedings of the 1999 IEEE/ACM international conference on Computer-aided design
Procedure Based Program Compression

International Journal of Parallel Programming - Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part II
Modular interprocedural pointer analysis using access paths: design, implementation, and evaluation

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
An integrated approach to accelerate data and predicate computations in hyperblocks

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Two-level hierarchical register file organization for VLIW processors

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Techniques for obtaining high performance in Java programs

ACM Computing Surveys (CSUR)
Offline program re-mapping to improve branch prediction efficiency in embedded systems

ASP-DAC '00 Proceedings of the 2000 Asia and South Pacific Design Automation Conference
A technique for QoS-based system partitioning

ASP-DAC '00 Proceedings of the 2000 Asia and South Pacific Design Automation Conference
Exploring Hypermedia Processor Design Space

Journal of VLSI Signal Processing Systems - Special issue on multimedia signal processing
Reversible Debugging Using Program Instrumentation

IEEE Transactions on Software Engineering
Partial method compilation using dynamic profile information

OOPSLA '01 Proceedings of the 16th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
An interleaved cache clustered VLIW processor

ICS '02 Proceedings of the 16th international conference on Supercomputing
A study of compiler techniques for multiple targets in compiler infrastructures

ACM SIGPLAN Notices
Datapath merging and interconnection sharing for reconfigurable architectures

Proceedings of the 15th international symposium on System Synthesis
I-CoPES: fast instruction code placement for embedded systems to improve performance and energy efficiency

Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Architectural differences of efficient sequential and parallel computers

Journal of Systems Architecture: the EUROMICRO Journal
Handling Global Constraints in Compiler Strategy

International Journal of Parallel Programming
Optimization of Machine Descriptions for Efficient Use

International Journal of Parallel Programming
Optimizing NET Compilers for Improved Java Performance

Computer
The Effect of Code Expanding Optimizations on Instruction Cache Design

IEEE Transactions on Computers
The Importance of Prepass Code Scheduling for Superscalar and Superpipelined Processors

IEEE Transactions on Computers
Three Architectural Models for Compiler-Controlled Speculative Execution

IEEE Transactions on Computers
Compiler-Assisted Multiple Instruction Rollback Recovery Using a Read Buffer

IEEE Transactions on Computers
Efficient Exploitation of Instruction-Level Parallelism for Superscalar Processors by the Conjugate Register File Scheme

IEEE Transactions on Computers
Modeling the impact of run-time uncertainty on optimal computation scheduling using feedback

ICPP '97 Proceedings of the international Conference on Parallel Processing
Hybrid Predication Model for Instruction Level Parallelism

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Optimizing Java Programs in the Presence of Exceptions

ECOOP '00 Proceedings of the 14th European Conference on Object-Oriented Programming
Selective Scheduling Framework for Speculative Operations in VLIW and Superscalar Processors

PACT '93 Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism
Balancing Fine- and Medium-Grained Parallelism in Scheduling Loops for the XIMD Architecture

PACT '93 Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism
Comparing Tail Duplication with Compensation Code in Single Path Global Instruction Scheduling

CC '01 Proceedings of the 10th International Conference on Compiler Construction
An Efficient Technique of Instruction Scheduling on a Superscalar-Based Mulprocessor

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Effective instruction scheduling techniques for an interleaved cache clustered VLIW processor

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Local scheduling techniques for memory coherence in a clustered VLIW processor with a distributed data cache

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
The Use of Feedback in Scheduling Parallel Computations

PAS '97 Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture

Proceedings of the 30th annual international symposium on Computer architecture
Flexible Compiler-Managed L0 Buffers for Clustered VLIW Processors

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Power modeling and reduction of VLIW processors

Compilers and operating systems for low power
Processor-memory coexploration using an architecture description language

ACM Transactions on Embedded Computing Systems (TECS)
TRIPS: A polymorphous architecture for exploiting ILP, TLP, and DLP

ACM Transactions on Architecture and Code Optimization (TACO)
The design of dynamically reconfigurable datapath coprocessors

ACM Transactions on Embedded Computing Systems (TECS)
Architectural Support for Enhanced SMT Job Scheduling

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Static Placement, Dynamic Issue (SPDI) Scheduling for EDGE Architectures

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Practical and Accurate Low-Level Pointer Analysis

Proceedings of the international symposium on Code generation and optimization
Distributed Data Cache Designs for Clustered VLIW Processors

IEEE Transactions on Computers
Instruction code mapping for performance increase and energy reduction in embedded computer systems

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Variable-Based Multi-module Data Caches for Clustered VLIW Processors

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
High-Performance and Low-Cost Dual-Thread VLIW Processor Using Weld Architecture Paradigm

IEEE Transactions on Parallel and Distributed Systems
A novel instruction scratchpad memory optimization method based on concomitance metric

ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Software and hardware techniques to optimize register file utilization in VLIW architectures

International Journal of Parallel Programming
Multi-parametric improvements for embedded systems using code-placement and address bus coding

ASP-DAC '03 Proceedings of the 2003 Asia and South Pacific Design Automation Conference
Automatic instruction scheduler retargeting by reverse-engineering

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
A lifetime optimal algorithm for speculative PRE

ACM Transactions on Architecture and Code Optimization (TACO)
Extracting and improving microarchitecture performance on reconfigurable architectures

International Journal of Parallel Programming - Special issue: The next generation software program
Compiler optimization of embedded applications for an adaptive SoC architecture

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Impact of intercluster communication mechanisms on ILP in clustered VLIW architectures

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Hybrid multi-core architecture for boosting single-threaded performance

ACM SIGARCH Computer Architecture News
Virtual Cluster Scheduling Through the Scheduling Graph

Proceedings of the International Symposium on Code Generation and Optimization
An Analytical Approach to Scheduling Code for Superscalar and VLIW Architectures

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Mapping control-intensive video kernels onto a coarse-grain reconfigurable architecture: the H.264/AVC deblocking filter

Proceedings of the conference on Design, automation and test in Europe
Energy-optimizing source code transformations for operating system-driven embedded software

ACM Transactions on Embedded Computing Systems (TECS)
Effective Code Generation for Distributed and Ping-Pong Register Files: A Case Study on PAC VLIW DSP Cores

Journal of Signal Processing Systems
Compiler Controlled Speculation for Power Aware ILP Extraction in Dataflow Architectures

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Mapping of nomadic multimedia applications on the ADRES reconfigurable array processor

Microprocessors & Microsystems
MediaBench II video: Expediting the next generation of video systems research

Microprocessors & Microsystems
The implementation and evaluation of a low-power clock distribution network based on EPIC

NPC'07 Proceedings of the 2007 IFIP international conference on Network and parallel computing
Still Image Processing on Coarse-Grained Reconfigurable Array Architectures

Journal of Signal Processing Systems
A formal method for providing temporal equivalence in binary-to-binary translation of real-time applications

RTSS'10 Proceedings of the 21st IEEE conference on Real-time systems symposium
Automatic application-specific microarchitecture reconfiguration

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A VLIW-based post compilation framework for multimedia embedded DSPs with hardware specific optimizations

MTPP'10 Proceedings of the Second Russia-Taiwan conference on Methods and tools of parallel programming multicomputers
Exploiting statistical information for implementation of instruction scratchpad memory in embedded system

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Power consumption analysis of embedded multimedia application

ICESS'05 Proceedings of the Second international conference on Embedded Software and Systems

Quantified Score

Hi-index	0.04

IMPACT: an architectural framework for multiple-instruction-issue processors

Quantified Score

Visualization

Abstract