Effective cluster assignment for modulo scheduling

Authors:
Erik Nystrom;Alexandre E. Eichenberger
Affiliations:
ECE Department, North Carolina State University, Box 7914, Raleigh NC;ECE Department, North Carolina State University, Box 7914, Raleigh NC
Venue:
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Year:
1998

Citing 19
Cited 44

Bulldog: a compiler for VLSI architectures

Bulldog: a compiler for VLSI architectures
Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Parallelization of loops with exits on pipelined architectures

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Architecture and implementation of a VLIW supercomputer

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Register allocation for software pipelined loops

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Register requirements of pipelined processors

ICS '92 Proceedings of the 6th international conference on Supercomputing
Partitioned register files for VLIWs: a preliminary analysis of tradeoffs

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Code optimizers and register organizations for vector architectures

Code optimizers and register organizations for vector architectures
The multiflow trace scheduling compiler

The Journal of Supercomputing - Special issue on instruction-level parallelism
The superblock: an effective technique for VLIW and superscalar compilation

The Journal of Supercomputing - Special issue on instruction-level parallelism
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Stage scheduling: a technique to reduce the register requirements of a modulo schedule

Proceedings of the 28th annual international symposium on Microarchitecture
Modulo scheduling of loops in control-intensive non-numeric programs

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Software pipelining loops with conditional branches

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Efficient formulation for optimal modulo schedulers

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Treegion Scheduling for Highly Parallel Processors

Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing

MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Non-Consistent Dual Register Files to Reduce Register Pressure

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Swing Modulo Scheduling: A Lifetime-Sensitive Approach

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques

Communication scheduling

ACM SIGPLAN Notices
Modulo scheduling for a fully-distributed clustered VLIW architecture

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Lifetime-Sensitive Modulo Scheduling in a Production Environment

IEEE Transactions on Computers
Communication scheduling

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
High-quality operation binding for clustered VLIW datapaths

Proceedings of the 38th annual Design Automation Conference
Loop Transformations for Architectures with Partitioned Register Banks

OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
Instruction scheduling for clustered VLIW architectures

ISSS '00 Proceedings of the 13th international symposium on System synthesis
Loop fusion for clustered VLIW architectures

Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Affinity-based cluster assignment for unrolled loops

ICS '02 Proceedings of the 16th international conference on Supercomputing
An interleaved cache clustered VLIW processor

ICS '02 Proceedings of the 16th international conference on Supercomputing
Graph-partitioning based instruction scheduling for clustered processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Modulo scheduling with integrated register spilling for clustered VLIW architectures

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Cluster assignment for high-performance embedded VLIW processors

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Dynamic Code Partitioning for Clustered Architectures

International Journal of Parallel Programming
Optimizing Loop Performance for Clustered VLIW Architectures

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Exploiting Pseudo-Schedules to Guide Data Dependence Graph Partitioning

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Effective instruction scheduling techniques for an interleaved cache clustered VLIW processor

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Local scheduling techniques for memory coherence in a clustered VLIW processor with a distributed data cache

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Region-based hierarchical operation partitioning for multicluster processors

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
The Effectiveness of Loop Unrolling for Modulo Scheduling in Clustered VLIW Architectures

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Flexible Compiler-Managed L0 Buffers for Clustered VLIW Processors

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Removing communications in clustered microarchitectures through instruction replication

ACM Transactions on Architecture and Code Optimization (TACO)
Automatic data partitioning for the agere payload plus network processor

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Distributed Data Cache Designs for Clustered VLIW Processors

IEEE Transactions on Computers
Exploiting Vector Parallelism in Software Pipelined Loops

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Software and hardware techniques to optimize register file utilization in VLIW architectures

International Journal of Parallel Programming
Modulo graph embedding: mapping applications onto coarse-grained reconfigurable architectures

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Minimizing bank selection instructions for partitioned memory architecture

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Compiler-assisted leakage energy optimization for clustered VLIW architectures

EMSOFT '06 Proceedings of the 6th ACM & IEEE International conference on Embedded software
Virtual Cluster Scheduling Through the Scheduling Graph

Proceedings of the International Symposium on Code Generation and Optimization
Heterogeneous Clustered VLIW Microarchitectures

Proceedings of the International Symposium on Code Generation and Optimization
Stream execution on wide-issue clustered VLIW architectures

Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Minimal placement of bank selection instructions for partitioned memory architectures

ACM Transactions on Embedded Computing Systems (TECS)
Communication optimizations for global multi-threaded instruction scheduling

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Modulo scheduling for highly customized datapaths to increase hardware reusability

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Edge-centric modulo scheduling for coarse-grained reconfigurable architectures

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Energy-aware register file re-partitioning for clustered VLIW architectures

Proceedings of the 2009 Asia and South Pacific Design Automation Conference
An algorithm to improve parallelism in distributed systems using asynchronous calls

PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Integrated Code Generation for Loops

ACM Transactions on Embedded Computing Systems (TECS)
Compiler-assisted energy optimization for clustered VLIW processors

Journal of Parallel and Distributed Computing
Compiler supports for VLIW DSP processors with SIMD intrinsics

Concurrency and Computation: Practice & Experience
A constraint programming approach for integrated spatial and temporal scheduling for clustered architectures

ACM Transactions on Embedded Computing Systems (TECS)
Fast modulo scheduler utilizing patternized routes for coarse-grained reconfigurable architectures

ACM Transactions on Architecture and Code Optimization (TACO)
Integrated modulo scheduling and cluster assignment for TI TMS320C64x+ architecture

Proceedings of the 11th Workshop on Optimizations for DSP and Embedded Systems

Quantified Score

Hi-index	0.01

Effective cluster assignment for modulo scheduling

Quantified Score

Visualization

Abstract