Bulldog: a compiler for VLSI architectures
Bulldog: a compiler for VLSI architectures
Architecture and compiler tradeoffs for a long instruction wordprocessor
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Foresighted Instruction Scheduling Under Timing Constraints
IEEE Transactions on Computers
Iterative modulo scheduling: an algorithm for software pipelining loops
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
The performance impact of incomplete bypassing in processor pipelines
Proceedings of the 28th annual international symposium on Microarchitecture
Memory-CPU size optimization for embedded system designs
DAC '97 Proceedings of the 34th annual Design Automation Conference
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
The utility of foresight in single server scheduling
ACM-SE 30 Proceedings of the 30th annual Southeast regional conference
Code Optimization Techniques for Embedded Processors: Methods, Algorithms, and Tools
Code Optimization Techniques for Embedded Processors: Methods, Algorithms, and Tools
Retargetable Code Generation for Digital Signal Processors
Retargetable Code Generation for Digital Signal Processors
Code Generation for Embedded Processors
Code Generation for Embedded Processors
Graph-partitioning based instruction scheduling for clustered processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Pipelining and Bypassing in a VLIW Processor
IEEE Transactions on Parallel and Distributed Systems
Region-based hierarchical operation partitioning for multicluster processors
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
CARS: A New Code Generation Framework for Clustered ILP Processors
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Complexity-effective superscalar processors
Complexity-effective superscalar processors
Compiler optimization of embedded applications for an adaptive SoC architecture
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Modulo scheduling for highly customized datapaths to increase hardware reusability
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Shared-port register file architecture for low-energy VLIW processors
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Application-specific instruction set processors (ASIPs)have the potential to meet the challenging cost, performance,and power goals of future embedded processors bycustomizing the hardware to suit an application. A centralproblem is creating compilers that are capable of dealingwith the heterogeneous and non-uniform hardware createdby the customization process. The processor datapath providesan effective area to customize, but specialized datapathsoften have non-uniform connectivity between the functionunits, making the effective latency of a function unitdependent on the consuming operation. Traditional instructionschedulers break down in this environment due to theirlocally greedy nature of binding the best choice for a singleoperation even though that choice may be poor due toa lack of communication paths. To effectively schedule withnon-uniform connectivity, we propose a foresighted latency-awarescheduling heuristic (FLASH) that performs lookaheadacross future scheduling steps to estimate the effectsof a potential binding. FLASH combines a set of lookaheadheuristics to achieve effective foresight with low compile-timeoverhead.