The multiflow trace scheduling compiler

Authors:
P. Geoffrey Lowney;Stefan M. Freudenberger;Thomas J. Karzes;W. D. Lichtenstein;Robert P. Nix;John S. O'Donnell;John Ruttenberg
Affiliations:
-;-;-;-;-;-;-
Venue:
The Journal of Supercomputing - Special issue on instruction-level parallelism
Year:
1993

Citing 0
Cited 120

Dependence-based program analysis

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Lifetime-sensitive modulo scheduling

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Avoidance and suppression of compensation code in a trace scheduling compiler

ACM Transactions on Programming Languages and Systems (TOPLAS)
Complexity/performance tradeoffs with non-blocking loads

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Dynamic memory disambiguation for array references

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Improving the accuracy of static branch prediction using branch correlation

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Improving balanced scheduling with compiler optimizations that increase instruction-level parallelism

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Compiler techniques for data prefetching on the PowerPC

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Performance issues in correlated branch prediction schemes

Proceedings of the 28th annual international symposium on Microarchitecture
The predictability of branches in libraries

Proceedings of the 28th annual international symposium on Microarchitecture
The performance impact of incomplete bypassing in processor pipelines

Proceedings of the 28th annual international symposium on Microarchitecture
Efficient instruction scheduling using finite state automata

Proceedings of the 28th annual international symposium on Microarchitecture
Critical path reduction for scalar programs

Proceedings of the 28th annual international symposium on Microarchitecture
Spill-free parallel scheduling of basic blocks

Proceedings of the 28th annual international symposium on Microarchitecture
The M-Machine multicomputer

Proceedings of the 28th annual international symposium on Microarchitecture
Region-based compilation: an introduction and motivation

Proceedings of the 28th annual international symposium on Microarchitecture
Unrolling-based optimizations for modulo scheduling

Proceedings of the 28th annual international symposium on Microarchitecture
A comparison of full and partial predicated execution support for ILP processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A reduced multipipeline machine description that preserves scheduling constraints

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Fast, effective dynamic compilation

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Exploiting dual data-memory banks in digital signal processors

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Global predicate analysis and its application to register allocation

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Instruction fetch mechanisms for VLIW architectures with compressed encodings

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Meld scheduling: relaxing scheduling constraints across region boundaries

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Custom-fit processors: letting applications define architectures

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Optimization of machine descriptions for efficient use

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
tcc: a system for fast, flexible, and high-level dynamic code generation

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Near-optimal intraprocedural branch alignment

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Annotation-directed run-time specialization in C

PEPM '97 Proceedings of the 1997 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

ACM Transactions on Computer Systems (TOCS)
Tuning compiler optimizations for simultaneous multithreading

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The multicluster architecture: reducing cycle time through partitioning

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Exploiting idle floating-point resources for integer execution

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor

Proceedings of the 25th annual international symposium on Computer architecture
Simultaneous multithreading: maximizing on-chip parallelism

25 years of the international symposia on Computer architecture (selected papers)
Effective cluster assignment for modulo scheduling

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Better global scheduling using path profiles

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Space-time scheduling of instruction-level parallelism on a raw machine

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Maps: a compiler-managed memory system for raw machines

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
The program decision logic approach to predicated execution

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Control CPR: a branch height reduction optimization for EPIC architectures

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
An evaluation of staged run-time optimizations in DyC

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Reorganizing global schedules for register allocation

ICS '99 Proceedings of the 13th international conference on Supercomputing
Resource usage models for instruction scheduling: two new models and a classification

ICS '99 Proceedings of the 13th international conference on Supercomputing
Concurrent Event Handling through Multithreading

IEEE Transactions on Computers
Exploiting ILP in page-based intelligent memory

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Software-Directed Register Deallocation for Simultaneous Multithreaded Processors

IEEE Transactions on Parallel and Distributed Systems
The Multicluster Architecture: Reducing Processor Cycle Time Through Partitioning

International Journal of Parallel Programming
Static correlated branch prediction

ACM Transactions on Programming Languages and Systems (TOPLAS)
Lx: a technology platform for customizable VLIW embedded processing

Proceedings of the 27th annual international symposium on Computer architecture
Tuning Compiler Optimizations for Simultaneous Multithreading

International Journal of Parallel Programming - Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part II
Overcoming the challenges to feedback-directed optimization (Keynote Talk)

DYNAMO '00 Proceedings of the ACM SIGPLAN workshop on Dynamic and adaptive compilation and optimization
Fusion-based register allocation

ACM Transactions on Programming Languages and Systems (TOPLAS)
Data Dependence Analysis of Assembly Code

International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Communication scheduling

ACM SIGPLAN Notices
The benefits and costs of DyC's run-time optimizations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Lifetime-Sensitive Modulo Scheduling in a Production Environment

IEEE Transactions on Computers
Communication scheduling

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Partial method compilation using dynamic profile information

OOPSLA '01 Proceedings of the 16th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Compiler Support for Scalable and Efficient Memory Systems

IEEE Transactions on Computers
Energy estimation and optimization of embedded VLIW processors based on instruction clustering

Proceedings of the 39th annual Design Automation Conference
Affinity-based cluster assignment for unrolled loops

ICS '02 Proceedings of the 16th international conference on Supercomputing
An interleaved cache clustered VLIW processor

ICS '02 Proceedings of the 16th international conference on Supercomputing
The impact of if-conversion and branch prediction on program execution on the Intel® Itanium™ processor

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Reducing the complexity of the register file in dynamic superscalar processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Path Analysis and Renaming for Predicated Instruction Scheduling

International Journal of Parallel Programming
Meld Scheduling: A Technique for Relaxing Scheduling Constraints

International Journal of Parallel Programming
Optimization of Machine Descriptions for Efficient Use

International Journal of Parallel Programming
Backtracking-Based Instruction Scheduling to Fill Branch Delay Slots

International Journal of Parallel Programming
Compilers for Instruction-Level Parallelism

Computer
Simultaneous Multithreading: A Platform for Next-Generation Processors

IEEE Micro
The MAP1000A VLIW Mediaprocessor

IEEE Micro
An Advanced Optimizer for the IA-64 Architecture

IEEE Micro
Static resource models for code-size efficient embedded processors

ACM Transactions on Embedded Computing Systems (TECS)
A Spill Code Placement Framework for Code Scheduling

LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Load Scheduling with Profile Information

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Branch prediction techniques for low-power VLIW processors

Proceedings of the 13th ACM Great Lakes symposium on VLSI
Convergent scheduling

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Optimizations to prevent cache penalties for the Intel® Itanium® 2 Processor

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Phi-Predication for light-weight if-conversion

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Region-based hierarchical operation partitioning for multicluster processors

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
A region-based compilation technique for a Java just-in-time compiler

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Predicate prediction for efficient out-of-order execution

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
How Useful Are Non-Blocking Loads, Stream Buffers and Speculative Execution in Multiple Issue Processors?

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
RTGEN: An Algorithm for Automatic Generation of Reservation Tables from Architectural Descriptions

Proceedings of the 12th international symposium on System synthesis
Dynamically managing the communication-parallelism trade-off in future clustered processors

Proceedings of the 30th annual international symposium on Computer architecture
Hardware Support for Control Transfers in Code Caches

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
RTGEN: an algorithm for automatic generation of reservation tables from architectural descriptions

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Compiling for template-based run-time code generation

Journal of Functional Programming
Software pipelining: an effective scheduling technique for VLIW machines

ACM SIGPLAN Notices - Best of PLDI 1979-1999
A retrospective on: "an evaluation of staged run-time optimizations in DyC"

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Operation tables for scheduling in the presence of incomplete bypassing

Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A tour of tempo: a program specializer for the C language

Science of Computer Programming - Special issue on program transformation
Application-Specific Processing on a General-Purpose Core via Transparent Instruction Set Customization

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
A Criticality Analysis of Clustering in Superscalar Processors

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
A region-based compilation technique for dynamic compilers

ACM Transactions on Programming Languages and Systems (TOPLAS)
Prematerialization: reducing register pressure for free

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Global instruction scheduling in dynamic compilation for embedded systems

JTRES '06 Proceedings of the 4th international workshop on Java technologies for real-time and embedded systems
Instruction scheduling for a tiled dataflow architecture

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Dataflow Predication

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
A wire delay-tolerant reconfigurable unit for a clustered programmable-reconfigurable processor

Microprocessors & Microsystems
Spike: an optimizer for alpha/NT executables

NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
A backtracking instruction scheduler using predicate-based code hoisting to fill delay slots

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Fast, frequency-based, integrated register allocation and instruction scheduling

Software—Practice & Experience
Techniques for Region-Based Register Allocation

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Hybrid multithreading for VLIW processors

CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Using a configurable processor generator for computer architecture prototyping

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
A scheduling approach for distributed resource architectures with scarce communication resources

International Journal of High Performance Systems Architecture
Retargetable pipeline hazard detection for partially bypassed processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Static speculation as post-link optimization for the Grid Alu processor

Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
Revisiting graph coloring register allocation: a study of the chaitin-briggs and callahan-koblenz algorithms

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
The use of traces for inlining in java programs

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
LUCAS: latency-adaptive unified cluster assignment and instruction scheduling

Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Shared-port register file architecture for low-energy VLIW processors

ACM Transactions on Architecture and Code Optimization (TACO)
CAeSaR: unified cluster-assignment scheduling and communication reuse for clustered VLIW processors

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Quantified Score

Hi-index	0.01

The multiflow trace scheduling compiler

Quantified Score

Visualization

Abstract