FLASH: Foresighted Latency-Aware Scheduling Heuristic for Processors with Customized Datapaths

Authors:
Manjunath Kudlur;Kevin Fan;Michael Chu;Rajiv Ravindran;Nathan Clark;Scott Mahlke
Affiliations:
-;-;-;-;-;-
Venue:
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Year:
2004

Citing 21
Cited 3

Bulldog: a compiler for VLSI architectures

Bulldog: a compiler for VLSI architectures
Architecture and compiler tradeoffs for a long instruction wordprocessor

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Lookahead scheduling

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Foresighted Instruction Scheduling Under Timing Constraints

IEEE Transactions on Computers
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
The performance impact of incomplete bypassing in processor pipelines

Proceedings of the 28th annual international symposium on Microarchitecture
Memory-CPU size optimization for embedded system designs

DAC '97 Proceedings of the 34th annual Design Automation Conference
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Tailoring pipeline bypassing and functional unit mapping to application in clustered VLIW architectures

CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
The utility of foresight in single server scheduling

ACM-SE 30 Proceedings of the 30th annual Southeast regional conference
Code Optimization Techniques for Embedded Processors: Methods, Algorithms, and Tools

Code Optimization Techniques for Embedded Processors: Methods, Algorithms, and Tools
Retargetable Code Generation for Digital Signal Processors

Retargetable Code Generation for Digital Signal Processors
Code Generation for Embedded Processors

Code Generation for Embedded Processors
Graph-partitioning based instruction scheduling for clustered processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
The Alpha AXP Architecture and 21064 Processor

IEEE Micro
Pipelining and Bypassing in a VLIW Processor

IEEE Transactions on Parallel and Distributed Systems
Region-based hierarchical operation partitioning for multicluster processors

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
CARS: A New Code Generation Framework for Clustered ILP Processors

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Using Internal Redundant Representations and Limited Bypass to Support Pipelined Adders and Register Files

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Complexity-effective superscalar processors

Complexity-effective superscalar processors

Compiler optimization of embedded applications for an adaptive SoC architecture

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Modulo scheduling for highly customized datapaths to increase hardware reusability

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Shared-port register file architecture for low-energy VLIW processors

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Application-specific instruction set processors (ASIPs)have the potential to meet the challenging cost, performance,and power goals of future embedded processors bycustomizing the hardware to suit an application. A centralproblem is creating compilers that are capable of dealingwith the heterogeneous and non-uniform hardware createdby the customization process. The processor datapath providesan effective area to customize, but specialized datapathsoften have non-uniform connectivity between the functionunits, making the effective latency of a function unitdependent on the consuming operation. Traditional instructionschedulers break down in this environment due to theirlocally greedy nature of binding the best choice for a singleoperation even though that choice may be poor due toa lack of communication paths. To effectively schedule withnon-uniform connectivity, we propose a foresighted latency-awarescheduling heuristic (FLASH) that performs lookaheadacross future scheduling steps to estimate the effectsof a potential binding. FLASH combines a set of lookaheadheuristics to achieve effective foresight with low compile-timeoverhead.