Cluster assignment for high-performance embedded VLIW processors

Authors:
Viktor S. Lapinskii;Margarida F. Jacome;Gustavo A. De Veciana
Affiliations:
The University of Texas at Austin, Austin, TX;The University of Texas at Austin, Austin, TX;The University of Texas at Austin, Austin, TX
Venue:
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Year:
2002

Citing 25
Cited 15

Bulldog: a compiler for VLSI architectures

Bulldog: a compiler for VLSI architectures
Force-directed scheduling in automatic data path synthesis

DAC '87 Proceedings of the 24th ACM/IEEE Design Automation Conference
Architecture and implementation of a VLIW supercomputer

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Partitioned register files for VLIWs: a preliminary analysis of tradeoffs

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
POWER2: next generation of the RISC System/6000 family

IBM Journal of Research and Development
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Instruction selection, resource allocation, and scheduling in the AVIV retargetable code generator

DAC '98 Proceedings of the 35th annual Design Automation Conference
Effective cluster assignment for modulo scheduling

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Lx: a technology platform for customizable VLIW embedded processing

Proceedings of the 27th annual international symposium on Computer architecture
Communication scheduling

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Instruction scheduling for clustered VLIW architectures

ISSS '00 Proceedings of the 13th international symposium on System synthesis
Accelerator Data-Path Synthesis for High-Throughput Signal Processing Applications

Accelerator Data-Path Synthesis for High-Throughput Signal Processing Applications
Synthesis and Optimization of Digital Circuits

Synthesis and Optimization of Digital Circuits
Digital Signal Processing: A Practical Approach

Digital Signal Processing: A Practical Approach
CALiBeR: a software pipelining algorithm for clustered embedded VLIW processors

Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Design Challenges for New Application-Specific Processors

IEEE Design & Test
The MAP1000A VLIW Mediaprocessor

IEEE Micro
Distributed Modulo Scheduling

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Instruction Scheduling for Clustered VLIW DSPs

PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
An Eight Issue Tree-VLIW Processor for Dynamic Binary Translation

ICCD '98 Proceedings of the International Conference on Computer Design
Parallel Media Processors for the Billion-Transistor Era

ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing
CARS: A New Code Generation Framework for Clustered ILP Processors

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Reuse and estimation techniques for embedded systems-on-a-chip

Reuse and estimation techniques for embedded systems-on-a-chip
Application-specific clustered VLIW datapaths: early exploration on a parameterized design space

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Integrated temporal and spatial scheduling for extended operand clustered VLIW processors

Proceedings of the 1st conference on Computing frontiers
RAS-NANO: a reliability-aware synthesis framework for reconfigurable nanofabrics

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Compiler-assisted leakage energy optimization for clustered VLIW architectures

EMSOFT '06 Proceedings of the 6th ACM & IEEE International conference on Embedded software
Virtual Cluster Scheduling Through the Scheduling Graph

Proceedings of the International Symposium on Code Generation and Optimization
Stream execution on wide-issue clustered VLIW architectures

Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Interactive presentation: Time-constrained clustering for DSE of clustered VLIW-ASP

Proceedings of the conference on Design, automation and test in Europe
Placement-and-routing-based register allocation for coarse-grained reconfigurable arrays

Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Compiler-assisted power optimization for clustered VLIW architectures

Parallel Computing
A scheduling approach for distributed resource architectures with scarce communication resources

International Journal of High Performance Systems Architecture
An efficient heuristic for instruction scheduling on clustered vliw processors

CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Exploring energy-performance trade-offs for heterogeneous interconnect clustered VLIW processors

HiPC'06 Proceedings of the 13th international conference on High Performance Computing
WCET-aware re-scheduling register allocation for real-time embedded systems with clustered VLIW architecture

Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
Compiler-assisted energy optimization for clustered VLIW processors

Journal of Parallel and Distributed Computing
LUCAS: latency-adaptive unified cluster assignment and instruction scheduling

Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
A constraint programming approach for integrated spatial and temporal scheduling for clustered architectures

ACM Transactions on Embedded Computing Systems (TECS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering is an effective method to increase the available parallelism in VLIW datapaths without incurring severe penalties associated with a large number of register file ports. Efficient utilization of a clustered datapath requires careful binding/assignment of operations to clusters. The article proposes a binding algorithm that effectively explores trade-offs between in-cluster operation serialization and delays associated with data transfers between clusters. Extensive experimental evidence is provided showing that the algorithm generates high quality solutions for representative kernels, with up to 33% improvement over a state-of-the-art binding algorithm.