Modulo scheduling for a fully-distributed clustered VLIW architecture

Authors:
Jesús Sánchez;Antonio González
Affiliations:
Dept. of Computer Architecture, Universitat Politècnica de Catalunya, Barcelona, Spain;Dept. of Computer Architecture, Universitat Politècnica de Catalunya, Barcelona, Spain
Venue:
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Year:
2000

Citing 19
Cited 27

Bulldog: a compiler for VLSI architectures

Bulldog: a compiler for VLSI architectures
Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Partitioned register files for VLIWs: a preliminary analysis of tradeoffs

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Cache miss equations: an analytical representation of cache misses

ICS '97 Proceedings of the 11th international conference on Supercomputing
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
Cache sensitive modulo scheduling

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Effective cluster assignment for modulo scheduling

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
Will Physical Scalability Sabotage Performance Gains?

Computer
The TigerSHARC DSP Architecture

IEEE Micro
Lockup-free instruction fetch/prefetch cache organization

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing

MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Distributed Modulo Scheduling

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
The Effectiveness of Loop Unrolling for Modulo Scheduling in Clustered VLIW Architectures

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Inherently lower-power high-performance superscalar architectures

Inherently lower-power high-performance superscalar architectures
An efficient solver for Cache Miss Equations

ISPASS '00 Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software

An interleaved cache clustered VLIW processor

ICS '02 Proceedings of the 16th international conference on Supercomputing
Graph-partitioning based instruction scheduling for clustered processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Enhancing loop buffering of media and telecommunications applications using low-overhead predication

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Exploiting Pseudo-Schedules to Guide Data Dependence Graph Partitioning

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Effective instruction scheduling techniques for an interleaved cache clustered VLIW processor

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Local scheduling techniques for memory coherence in a clustered VLIW processor with a distributed data cache

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A scalable wide-issue clustered VLIW with a reconfigurable interconnect

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Flexible Compiler-Managed L0 Buffers for Clustered VLIW Processors

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
A fast and accurate framework to analyze and optimize cache memory behavior

ACM Transactions on Programming Languages and Systems (TOPLAS)
Cluster prefetch: tolerating on-chip wire delays in clustered microarchitectures

Proceedings of the 18th annual international conference on Supercomputing
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams

Proceedings of the 31st annual international symposium on Computer architecture
Instruction buffering exploration for low energy VLIWs with instruction clusters

Proceedings of the 2004 Asia and South Pacific Design Automation Conference
Compiler-Directed ILP Extraction for Clustered VLIW/EPIC Machines: Predication, Speculation and Modulo Scheduling

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Clustered Loop Buffer Organization for Low Energy VLIW Embedded Processors

IEEE Transactions on Computers
Distributed Data Cache Designs for Clustered VLIW Processors

IEEE Transactions on Computers
A Distributed Control Path Architecture for VLIW Processors

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Variable-Based Multi-module Data Caches for Clustered VLIW Processors

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Register aware scheduling for distributed cache clustered architecture

ASP-DAC '03 Proceedings of the 2003 Asia and South Pacific Design Automation Conference
Compiler-directed Data Partitioning for Multicluster Processors

Proceedings of the International Symposium on Code Generation and Optimization
Modulo graph embedding: mapping applications onto coarse-grained reconfigurable architectures

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Very wide register: an asymmetric register file organization for low power embedded processors

Proceedings of the conference on Design, automation and test in Europe
Efficient implementation of nested-loop multimedia algorithms

EURASIP Journal on Applied Signal Processing
Modulo scheduling for highly customized datapaths to increase hardware reusability

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Orchestrating the execution of stream programs on multicore platforms

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Edge-centric modulo scheduling for coarse-grained reconfigurable architectures

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Energy-aware register file re-partitioning for clustered VLIW architectures

Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Fast modulo scheduler utilizing patternized routes for coarse-grained reconfigurable architectures

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.01

Modulo scheduling for a fully-distributed clustered VLIW architecture

Quantified Score

Visualization

Abstract