The superblock: an effective technique for VLIW and superscalar compilation

Authors:
Wen-Mei W. Hwu;Scott A. Mahlke;William Y. Chen;Pohua P. Chang;Nancy J. Warter;Roger A. Bringmann;Roland G. Ouellette;Richard E. Hank;Tokuzo Kiyohara;Grant E. Haab;John G. Holm;Daniel M. Lavery
Affiliations:
-;-;-;-;-;-;-;-;-;-;-;-
Venue:
The Journal of Supercomputing - Special issue on instruction-level parallelism
Year:
1993

Citing 0
Cited 209

Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Sentinel scheduling: a model for compiler-controlled speculative execution

ACM Transactions on Computer Systems (TOCS)
Register connection: a new approach to adding registers into instruction set architectures

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Increasing the instruction fetch rate via multiple branch prediction and a branch address cache

ICS '93 Proceedings of the 7th international conference on Supercomputing
Avoidance and suppression of compensation code in a trace scheduling compiler

ACM Transactions on Programming Languages and Systems (TOPLAS)
Height reduction of control recurrences for ILP processors

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Data relocation and prefetching for programs with large data sets

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
A comparison of two pipeline organizations

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Register file port requirements of transport triggered architectures

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Characterizing the impact of predicated execution on branch prediction

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Dynamic memory disambiguation using the memory conflict buffer

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Improving the accuracy of static branch prediction using branch correlation

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Compiler transformations for high-performance computing

ACM Computing Surveys (CSUR)
GURRR: a global unified resource requirements representation

IR '95 Papers from the 1995 ACM SIGPLAN workshop on Intermediate representations
Optimization of instruction fetch mechanisms for high issue rates

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Performance issues in correlated branch prediction schemes

Proceedings of the 28th annual international symposium on Microarchitecture
Critical path reduction for scalar programs

Proceedings of the 28th annual international symposium on Microarchitecture
Modulo scheduling with multiple initiation intervals

Proceedings of the 28th annual international symposium on Microarchitecture
Region-based compilation: an introduction and motivation

Proceedings of the 28th annual international symposium on Microarchitecture
Dynamic rescheduling: a technique for object code compatibility in VLIW architectures

Proceedings of the 28th annual international symposium on Microarchitecture
Unrolling-based optimizations for modulo scheduling

Proceedings of the 28th annual international symposium on Microarchitecture
A comparison of full and partial predicated execution support for ILP processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Importance of profiling and compatibility

ACM Computing Surveys (CSUR) - Special issue: position statements on strategic directions in computing research
Accurate and practical profile-driven compilation using the profile buffer

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Profile-driven instruction level parallel scheduling with application to super blocks

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Speculative hedge: regulating compile-time speculation against profile variations

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Modulo scheduling of loops in control-intensive non-numeric programs

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Compiler synthesized dynamic branch prediction

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Increasing the instruction fetch rate via block-structured instruction set architectures

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Instruction fetch mechanisms for VLIW architectures with compressed encodings

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Meld scheduling: relaxing scheduling constraints across region boundaries

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Predictability of load/store instruction latencies

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Speculative execution exception recovery using write-back suppression

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Superblock formation using static program analysis

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Partial dead code elimination using slicing transformations

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Near-optimal intraprocedural branch alignment

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
The bi-mode branch predictor

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Reducing the performance impact of instruction cache misses by writing instructions into the reservation stations out-of-order

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A framework for balancing control flow and predication

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Tuning compiler optimizations for simultaneous multithreading

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Resource-sensitive profile-directed data flow analysis for code optimization

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Parallelizing nonnumerical code with selective scheduling and software pipelining

ACM Transactions on Programming Languages and Systems (TOPLAS)
Media architecture: general purpose vs. multiple application-specific programmable processor

DAC '98 Proceedings of the 35th annual Design Automation Conference
Improving data-flow analysis with path profiles

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Effective cluster assignment for modulo scheduling

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Better global scheduling using path profiles

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Space-time scheduling of instruction-level parallelism on a raw machine

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
An out-of-order execution technique for runtime binary translators

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Value speculation scheduling for high performance processors

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
A Trace Cache Microarchitecture and Evaluation

IEEE Transactions on Computers - Special issue on cache memory and related problems
Evaluation of Design Options for the Trace Cache Fetch Mechanism

IEEE Transactions on Computers - Special issue on cache memory and related problems
Value prediction in VLIW machines

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Load-reuse analysis: design and evaluation

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Control CPR: a branch height reduction optimization for EPIC architectures

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Control Flow Prediction Schemes for Wide-Issue Superscalar Processors

IEEE Transactions on Parallel and Distributed Systems
Reorganizing global schedules for register allocation

ICS '99 Proceedings of the 13th international conference on Supercomputing
Power efficient mediaprocessors: design space exploration

Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Designing power efficient hypermedia processors

ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Efficient and precise modeling of exceptions for the analysis of Java programs

Proceedings of the 1999 ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
Compiler-driven cached code compression schemes for embedded ILP processors

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Automatic and efficient evaluation of memory hierarchies for embedded systems

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Wavefront scheduling: path based data representation and scheduling of subgraphs

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Balance scheduling: weighting branch tradeoffs in superblocks

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
The Partial Reverse If-Conversion Framework for Balancing Control Flow and Predication

International Journal of Parallel Programming
Static correlated branch prediction

ACM Transactions on Programming Languages and Systems (TOPLAS)
A hardware mechanism for dynamic extraction and relayout of program hot spots

Proceedings of the 27th annual international symposium on Computer architecture
Tuning Compiler Optimizations for Simultaneous Multithreading

International Journal of Parallel Programming - Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part II
Dynamo: a transparent dynamic optimization system

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Modular interprocedural pointer analysis using access paths: design, implementation, and evaluation

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Overcoming the challenges to feedback-directed optimization (Keynote Talk)

DYNAMO '00 Proceedings of the ACM SIGPLAN workshop on Dynamic and adaptive compilation and optimization
Fusion-based register allocation

ACM Transactions on Programming Languages and Systems (TOPLAS)
Properties of Rescheduling Size Invariance for Dynamic Rescheduling-Based VLIW Cross-Generation Compatibility

IEEE Transactions on Computers
Hardware support for dynamic activation of compiler-directed computation reuse

ACM SIGPLAN Notices
An integrated approach to accelerate data and predicate computations in hyperblocks

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Two-level hierarchical register file organization for VLIW processors

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Offline program re-mapping to improve branch prediction efficiency in embedded systems

ASP-DAC '00 Proceedings of the 2000 Asia and South Pacific Design Automation Conference
A technique for QoS-based system partitioning

ASP-DAC '00 Proceedings of the 2000 Asia and South Pacific Design Automation Conference
Exploring Hypermedia Processor Design Space

Journal of VLSI Signal Processing Systems - Special issue on multimedia signal processing
Exploring the Interaction between Java's Implicitly Thrown Exceptions and Instruction Scheduling

International Journal of Parallel Programming
Clustered VLIW architecture with predicated switching

Proceedings of the 38th annual Design Automation Conference
Hardware support for dynamic activation of compiler-directed computation reuse

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Compiler optimization of dynamic data distributions for distributed-memory multicomputers

Compiler optimizations for scalable parallel systems
Scheduling time-constrained instructions on pipelined processors

ACM Transactions on Programming Languages and Systems (TOPLAS)
Partial method compilation using dynamic profile information

OOPSLA '01 Proceedings of the 16th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Compiler-Assisted Multiple Instruction Word Retry for VLIW Architectures

IEEE Transactions on Parallel and Distributed Systems
Scheduling Superblocks with Bound-Based Branch Trade-Offs

IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Evaluating the use of profiling by a region-based register allocator

Proceedings of the 2002 ACM symposium on Applied computing
On the Boosting of Instruction Scheduling by Renaming

The Journal of Supercomputing
Exploiting VLIW schedule slacks for dynamic and leakage energy reduction

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Efficient static single assignment form for predication

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Embedded software in real-time signal processing systems: design technologies

Readings in hardware/software co-design
Code coverage and input variability: effects on architecture and compiler research

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Cost effective memory disambiguation for multimedia codes

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Online feedback-directed optimization of Java

OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Handling Global Constraints in Compiler Strategy

International Journal of Parallel Programming
Meld Scheduling: A Technique for Relaxing Scheduling Constraints

International Journal of Parallel Programming
Increasing the Instruction Fetch Rate via Block-Structured Instruction Set Architectures

International Journal of Parallel Programming
Software Trace Cache for Commercial Applications

International Journal of Parallel Programming
Backtracking-Based Instruction Scheduling to Fill Branch Delay Slots

International Journal of Parallel Programming
Changing Interaction of Compiler and Architecture

Computer
Compilers for Instruction-Level Parallelism

Computer
The IA-64 Architecture at Work

Computer
Introducing the FR500 Embedded Microprocessor

IEEE Micro
The Importance of Prepass Code Scheduling for Superscalar and Superpipelined Processors

IEEE Transactions on Computers
Three Architectural Models for Compiler-Controlled Speculative Execution

IEEE Transactions on Computers
Minimum Register Instruction Sequencing to Reduce Register Spills in Out-of-Order Issue Superscalar Architectures

IEEE Transactions on Computers
Modeling Value Speculation: An Optimal Edge Selection Problem

IEEE Transactions on Computers
Data remapping for design space optimization of embedded memory systems

ACM Transactions on Embedded Computing Systems (TECS)
Dynamic Path Profile Aided Recompilation in a JAVA Just-In-Time Compiler

HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
Instruction Scheduling in the Presence of Java's Runtime Exceptions

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
An Architecture Framework for Introducing Predicated Execution into Embedded Microprocessors

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
FlexCC2: An Optimizing Retargetable C Compiler for DSP Processors

EMSOFT '02 Proceedings of the Second International Conference on Embedded Software
Comparing Tail Duplication with Compensation Code in Single Path Global Instruction Scheduling

CC '01 Proceedings of the 10th International Conference on Compiler Construction
Eliminating Exception Constraints of Java Programs for IA-64

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Master/slave speculative parallelization

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Convergent scheduling

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Dynamic binary translation for accumulator-oriented architectures

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Improving quasi-dynamic schedules through region slip

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Split-Path Enhanced Pipeline Scheduling

IEEE Transactions on Parallel and Distributed Systems
A region-based compilation technique for a Java just-in-time compiler

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Inter-Cluster Communication Models for Clustered VLIW Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Memory Hierarchy Design for Jetpipeline: To Execute Scalar and Vector Instructions in Parallel

PAS '97 Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis
Exploiting compiler-generated schedules for energy savings in high-performance processors

Proceedings of the 2003 international symposium on Low power electronics and design
Cluster assignment of global values for clustered VLIW processors

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Increasing the number of effective registers in a low-power processor using a windowed register file

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Hardware Support for Control Transfers in Code Caches

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Targeted Path Profiling: Lower Overhead Path Profiling for Staged Dynamic Optimization Systems

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Exploring Code Cache Eviction Granularities in Dynamic Optimization Systems

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Using Dynamic Binary Translation to Fuse Dependent Instructions

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
VHC: Quickly Building an Optimizer for Complex Embedded Architectures

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
RABIT: A New Framework for Runtime Emulation and Binary Translation

ANSS '04 Proceedings of the 37th annual symposium on Simulation
Efficient instruction scheduling for a pipelined architecture

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Improving data-flow analysis with path profiles

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Using Compressed Bytecode Traces for Slicing Java Programs

Proceedings of the 26th International Conference on Software Engineering
Field-testing IMPACT EPIC research results in Itanium 2

Proceedings of the 31st annual international symposium on Computer architecture
Compiler orchestrated prefetching via speculation and predication

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Optimal Superblock Scheduling Using Enumeration

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
A scheduling algorithm for optimization and early planning in high-level synthesis

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Performance of Runtime Optimization on BLAST

Proceedings of the international symposium on Code generation and optimization
Practical Path Profiling for Dynamic Optimizers

Proceedings of the international symposium on Code generation and optimization
A Programmable Hardware Path Profiler

Proceedings of the international symposium on Code generation and optimization
Practical and Accurate Low-Level Pointer Analysis

Proceedings of the international symposium on Code generation and optimization
Sentinel PRE: Hoisting beyond Exception Dependency with Dynamic Deoptimization

Proceedings of the international symposium on Code generation and optimization
Dynamic run-time architecture techniques for enabling continuous optimization

Proceedings of the 2nd conference on Computing frontiers
Static strands: safely collapsing dependence chains for increasing embedded power efficiency

LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors

Proceedings of the 32nd annual international symposium on Computer Architecture
Partitioning Variables across Register Windows to Reduce Spill Code in a Low-Power Processor

IEEE Transactions on Computers
Dynamic memory interval test vs. interprocedural pointer analysis in multimedia applications

ACM Transactions on Architecture and Code Optimization (TACO)
Data-Dependency Graph Transformations for Instruction Scheduling

Journal of Scheduling
Future wireless convergence platforms

CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Compiler-guided register reliability improvement against soft errors

Proceedings of the 5th ACM international conference on Embedded software
A region-based compilation technique for dynamic compilers

ACM Transactions on Programming Languages and Systems (TOPLAS)
Software and hardware techniques to optimize register file utilization in VLIW architectures

International Journal of Parallel Programming
2D-Profiling: Detecting Input-Dependent Branches with a Single Input Data Set

Proceedings of the International Symposium on Code Generation and Optimization
Dynamic parallelization and mapping of binary executables on hierarchical platforms

Proceedings of the 3rd conference on Computing frontiers
Reducing dynamic and leakage energy in VLIW architectures

ACM Transactions on Embedded Computing Systems (TECS)
A framework for unrestricted whole-program optimization

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Compiler-directed thermal management for VLIW functional units

Proceedings of the 2006 ACM SIGPLAN/SIGBED conference on Language, compilers, and tool support for embedded systems
Reducing Startup Time in Co-Designed Virtual Machines

Proceedings of the 33rd annual international symposium on Computer Architecture
Improving WCET by applying worst-case path optimizations

Real-Time Systems
Global instruction scheduling in dynamic compilation for embedded systems

JTRES '06 Proceedings of the 4th international workshop on Java technologies for real-time and embedded systems
Software-based instruction caching for embedded processors

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Reaching fast code faster: using modeling for efficient software thread integration on a VLIW DSP

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Impact of intercluster communication mechanisms on ILP in clustered VLIW architectures

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Merging Head and Tail Duplication for Convergent Hyperblock Formation

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Data-Dependency Graph Transformations for Superblock Scheduling

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Hybrid-scheduling for reduced energy consumption in high-performance processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Reducing code size in VLIW instruction scheduling

Journal of Embedded Computing - Low-power Embedded Systems
JIST: Just-In-Time scheduling translation for parallel processors

Scientific Programming
Hybrid multi-core architecture for boosting single-threaded performance

ACM SIGARCH Computer Architecture News
Hardware atomicity for reliable software speculation

Proceedings of the 34th annual international symposium on Computer architecture
Virtual Cluster Scheduling Through the Scheduling Graph

Proceedings of the International Symposium on Code Generation and Optimization
An Analytical Approach to Scheduling Code for Superscalar and VLIW Architectures

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Static strands: Safely exposing dependence chains for increasing embedded power efficiency

ACM Transactions on Embedded Computing Systems (TECS) - Special Section LCTES'05
A backtracking instruction scheduler using predicate-based code hoisting to fill delay slots

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Enlarging Instruction Streams

IEEE Transactions on Computers
VEBoC: variation and error-aware design for billions of devices on a chip

Proceedings of the 2008 Asia and South Pacific Design Automation Conference
An Application of Constraint Programming to Superblock Instruction Scheduling

CP '08 Proceedings of the 14th international conference on Principles and Practice of Constraint Programming
On the exploitation of loop-level parallelism in embedded applications

ACM Transactions on Embedded Computing Systems (TECS)
Optimal trace scheduling using enumeration

ACM Transactions on Architecture and Code Optimization (TACO)
Techniques for efficient placement of synchronization primitives

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Mostly static program partitioning of binary executables

ACM Transactions on Programming Languages and Systems (TOPLAS)
Dynamic parallelization of single-threaded binary programs using speculative slicing

Proceedings of the 23rd international conference on Supercomputing
Hardware-compiler co-design for adjustable data power savings

Microprocessors & Microsystems
MediaBench II video: Expediting the next generation of video systems research

Microprocessors & Microsystems
Modern development methods and tools for embedded reconfigurable systems: A survey

Integration, the VLSI Journal
Novel online profiling for virtual machines

Proceedings of the 6th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Dynamic binary translation specialized for embedded systems

Proceedings of the 6th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
A real system evaluation of hardware atomicity for software speculation

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Tree traversal scheduling: a global instruction scheduling technique for VLIW/EPIC processors

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Instruction scheduling for VLIW processors under variation scenario

SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
Dynamo: a transparent dynamic optimization system

ACM SIGPLAN Notices
Dynamic instruction scheduling in a trace-based multi-threaded architecture

International Journal of Parallel Programming
Hardware support for multithreaded execution of loops with limited parallelism

PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
Integrating a new cluster assignment and scheduling algorithm into an experimental retargetable code generation framework

HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Bioinformatics on embedded systems: a case study of computational biology applications on VLIW architecture

ICESS'05 Proceedings of the Second international conference on Embedded Software and Systems
An overview of the open research compiler

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Trimaran: an infrastructure for research in instruction-level parallelism

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
The use of traces for inlining in java programs

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Bundled execution of recurring traces for energy-efficient general purpose processing

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
HydraVM: extracting parallelism from legacy sequential code using STM

HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
Elimination of parallel copies using code motion on data dependence graphs

Computer Languages, Systems and Structures
SMARQ: Software-Managed Alias Register Queue for Dynamic Optimizations

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
LUCAS: latency-adaptive unified cluster assignment and instruction scheduling

Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Intermediate representations in imperative compilers: A survey

ACM Computing Surveys (CSUR)
Software thread integration for instruction-level parallelism

ACM Transactions on Embedded Computing Systems (TECS)
CAeSaR: unified cluster-assignment scheduling and communication reuse for clustered VLIW processors

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Quantified Score

Hi-index	0.02

The superblock: an effective technique for VLIW and superscalar compilation

Quantified Score

Visualization

Abstract