Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures

Authors:
Emre Özer;Sanjeev Banerjia;Thomas M. Conte
Affiliations:
North Carolina State University, Raleigh, NC;Hewlett-Packard Laboratories, Cambridge, MA;North Carolina State University, Raleigh, NC
Venue:
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Year:
1998

Citing 15
Cited 68

Bulldog: a compiler for VLSI architectures

Bulldog: a compiler for VLSI architectures
A VLIW architecture for a trace scheduling compiler

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Partitioned register files for VLIWs: a preliminary analysis of tradeoffs

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The multiflow trace scheduling compiler

The Journal of Supercomputing - Special issue on instruction-level parallelism
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The M-Machine multicomputer

Proceedings of the 28th annual international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Exploiting idle floating-point resources for integer execution

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Space-time scheduling of instruction-level parallelism on a raw machine

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Parallel processing: a smart compiler and a dumb machine

SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors

IEEE Transactions on Parallel and Distributed Systems
Treegion Scheduling for Wide Issue Processors

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Instruction scheduling and fetch mechanisms for clustered vliw processors

Instruction scheduling and fetch mechanisms for clustered vliw processors
A formal approach to the scheduling problem in high level synthesis

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Modulo scheduling for the TMS320C6x VLIW DSP architecture

Proceedings of the ACM SIGPLAN 1999 workshop on Languages, compilers, and tools for embedded systems
Communication scheduling

ACM SIGPLAN Notices
Modulo scheduling for a fully-distributed clustered VLIW architecture

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Communication scheduling

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
High-quality operation binding for clustered VLIW datapaths

Proceedings of the 38th annual Design Automation Conference
Loop Transformations for Architectures with Partitioned Register Banks

OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
Instruction scheduling for clustered VLIW architectures

ISSS '00 Proceedings of the 13th international symposium on System synthesis
Tailoring pipeline bypassing and functional unit mapping to application in clustered VLIW architectures

CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Loop fusion for clustered VLIW architectures

Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Affinity-based cluster assignment for unrolled loops

ICS '02 Proceedings of the 16th international conference on Supercomputing
Graph-partitioning based instruction scheduling for clustered processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Modulo scheduling with integrated register spilling for clustered VLIW architectures

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Cluster assignment for high-performance embedded VLIW processors

ACM Transactions on Design Automation of Electronic Systems (TODAES)
CALiBeR: a software pipelining algorithm for clustered embedded VLIW processors

Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Exploiting Pseudo-Schedules to Guide Data Dependence Graph Partitioning

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Convergent scheduling

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Effective instruction scheduling techniques for an interleaved cache clustered VLIW processor

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Local scheduling techniques for memory coherence in a clustered VLIW processor with a distributed data cache

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Meta optimization: improving compiler heuristics with machine learning

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Region-based hierarchical operation partitioning for multicluster processors

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Inter-Cluster Communication Models for Clustered VLIW Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
The Effectiveness of Loop Unrolling for Modulo Scheduling in Clustered VLIW Architectures

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Cluster assignment of global values for clustered VLIW processors

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Flexible Compiler-Managed L0 Buffers for Clustered VLIW Processors

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Integrated temporal and spatial scheduling for extended operand clustered VLIW processors

Proceedings of the 1st conference on Computing frontiers
FLASH: Foresighted Latency-Aware Scheduling Heuristic for Processors with Customized Datapaths

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Cost-Sensitive Partitioning in an Architecture Synthesis System for Multicluster Processors

IEEE Micro
Automatic data partitioning for the agere payload plus network processor

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Static Placement, Dynamic Issue (SPDI) Scheduling for EDGE Architectures

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Evaluation of Bus Based Interconnect Mechanisms in Clustered VLIW Architectures

Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Differential register allocation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Distributed Data Cache Designs for Clustered VLIW Processors

IEEE Transactions on Computers
A cycle-accurate compilation algorithm for custom pipelined datapaths

CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Utilizing Horizontal and Vertical Parallelism with a No-Instruction-Set Compiler for Custom Datapaths

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
A Criticality Analysis of Clustering in Superscalar Processors

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Software and hardware techniques to optimize register file utilization in VLIW architectures

International Journal of Parallel Programming
Register aware scheduling for distributed cache clustered architecture

ASP-DAC '03 Proceedings of the 2003 Asia and South Pacific Design Automation Conference
Compiler-directed Data Partitioning for Multicluster Processors

Proceedings of the International Symposium on Code Generation and Optimization
A spatial path scheduling algorithm for EDGE architectures

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Instruction scheduling for a tiled dataflow architecture

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Compiler-assisted leakage energy optimization for clustered VLIW architectures

EMSOFT '06 Proceedings of the 6th ACM & IEEE International conference on Embedded software
Impact of intercluster communication mechanisms on ILP in clustered VLIW architectures

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Allocating architected registers through differential encoding

ACM Transactions on Programming Languages and Systems (TOPLAS)
Inter-cluster communication in VLIW architectures

ACM Transactions on Architecture and Code Optimization (TACO)
Virtual Cluster Scheduling Through the Scheduling Graph

Proceedings of the International Symposium on Code Generation and Optimization
Interactive presentation: Time-constrained clustering for DSE of clustered VLIW-ASP

Proceedings of the conference on Design, automation and test in Europe
Design principles for a virtual multiprocessor

Proceedings of the 2007 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries
Modulo scheduling for highly customized datapaths to increase hardware reusability

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Optimal vs. heuristic integrated code generation for clustered VLIW architectures

SCOPES '08 Proceedings of the 11th international workshop on Software & compilers for embedded systems
Computation and data transfer co-scheduling for interconnection bus minimization

Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Loop-Aware Instruction Scheduling with Dynamic Contention Tracking for Tiled Dataflow Architectures

CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
Evaluation of bus based interconnect mechanisms in clustered VLIW architectures

International Journal of Parallel Programming
Optimizing scheduling and intercluster connection for application-specific DSP processors

IEEE Transactions on Signal Processing
Compiler driven architecture design space exploration for DSP workloads: a study in software programmability versus hardware acceleration

Asilomar'09 Proceedings of the 43rd Asilomar conference on Signals, systems and computers
Compiler-assisted power optimization for clustered VLIW architectures

Parallel Computing
A scheduling approach for distributed resource architectures with scarce communication resources

International Journal of High Performance Systems Architecture
An efficient heuristic for instruction scheduling on clustered vliw processors

CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Exploring energy-performance trade-offs for heterogeneous interconnect clustered VLIW processors

HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Integrating a new cluster assignment and scheduling algorithm into an experimental retargetable code generation framework

HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Integrated Code Generation for Loops

ACM Transactions on Embedded Computing Systems (TECS)
WCET-aware re-scheduling register allocation for real-time embedded systems with clustered VLIW architecture

Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
Compiler-assisted energy optimization for clustered VLIW processors

Journal of Parallel and Distributed Computing
Compiler supports for VLIW DSP processors with SIMD intrinsics

Concurrency and Computation: Practice & Experience
Code generation for an application-specific VLIW processor with clustered, addressable register files

Proceedings of the 10th Workshop on Optimizations for DSP and Embedded Systems
LUCAS: latency-adaptive unified cluster assignment and instruction scheduling

Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Fast modulo scheduler utilizing patternized routes for coarse-grained reconfigurable architectures

ACM Transactions on Architecture and Code Optimization (TACO)
CAeSaR: unified cluster-assignment scheduling and communication reuse for clustered VLIW processors

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Integrated modulo scheduling and cluster assignment for TI TMS320C64x+ architecture

Proceedings of the 11th Workshop on Optimizations for DSP and Embedded Systems

Quantified Score

Hi-index	0.01

Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures

Quantified Score

Visualization

Abstract