Bulldog: a compiler for VLSI architectures
Bulldog: a compiler for VLSI architectures
Efficient instruction scheduling for a pipelined architecture
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
MIPS RISC architecture
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Instruction scheduling for the IBM RISC System/6000 processor
IBM Journal of Research and Development
Scheduling time-critical instructions on RISC machines
POPL '90 Proceedings of the 17th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Lockup-free caches in high-performance multiprocessors
Journal of Parallel and Distributed Computing
High-bandwidth data memory systems for superscalar processors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
ICS '90 Proceedings of the 4th international conference on Supercomputing
APRIL: a processor architecture for multiprocessing
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
MC88100 Microprocessors User's Manual
MC88100 Microprocessors User's Manual
Code generation and reorganization in the presence of pipeline constraints
POPL '82 Proceedings of the 9th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Lockup-free instruction fetch/prefetch cache organization
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Instruction scheduling and executable editing
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Predictability of load/store instruction latencies
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Cache sensitive modulo scheduling
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Cache-conscious data placement
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Code transformations to improve memory parallelism
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Handling Global Constraints in Compiler Strategy
International Journal of Parallel Programming
Load Scheduling with Profile Information
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Static Scheduling of Instructions on Micronet-based Asynchronous Processors
ASYNC '96 Proceedings of the 2nd International Symposium on Advanced Research in Asynchronous Circuits and Systems
Efficient instruction scheduling for a pipelined architecture
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Static Placement, Dynamic Issue (SPDI) Scheduling for EDGE Architectures
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Queue Usage and Memory-Level Parallelism Sensitive Scheduling
HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Instruction scheduling for a tiled dataflow architecture
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Latency-tolerant software pipelining in a production compiler
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Variable latency caches for nanoscale processor
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Engineering scalable, cache and space efficient tries for strings
The VLDB Journal — The International Journal on Very Large Data Bases
Redesigning the string hash table, burst trie, and BST to exploit cache
Journal of Experimental Algorithmics (JEA)
Feedback-Based global instruction scheduling for GPGPU applications
ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part I
Hi-index | 0.00 |
Traditional list schedulers order instructions based on an optimistic estimate of the load delay imposed by the implementation. Therefore they cannot respond to variations in load latencies (due to cache hits or misses, congestion in the memory interconnect, etc.) and cannot easily be applied across different implementations. We have developed an alternative algorithm, known as balanced scheduling, that schedule instructions based on an estimate of the amount of instruction level parallelism in the program. Since scheduling decisions are program-rather than machine-based, balanced scheduling is unaffected by implementation changes. Since it is based on the amount of instruction level parallelism that a program can support, it can respond better to variations in load latencies. Performance improvements over a traditional list scheduler on a Fortran workload and simulating several different machine types (cache-based workstations, large parallel machines with a multipath interconnect and a combination, all with non-blocking processors) are quite good, averaging between 3% and 18%.