Global instruction scheduling for superscalar machines
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Balanced scheduling: instruction scheduling when memory latency is uncertain
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
Code transformations to improve memory parallelism
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Enhancing memory level parallelism via recovery-free value prediction
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
A New Framework for Integrated Global Local Scheduling
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Microarchitecture Optimizations for Exploiting Memory-Level Parallelism
Proceedings of the 31st annual international symposium on Computer architecture
Improving Load/Store Queues Usage in Scientific Computing
ICPP '04 Proceedings of the 2004 International Conference on Parallel Processing
Hi-index | 0.00 |
In out-of-order (OOO) processors, Reorder queue (ROQ) has been widely used to implement precise interruption. The full of ROQ will cause the whole processor stall, while a long latency operation, e.g. a load missed in the caches, will almost definitely cause the ROQ full. In this paper we present a model for estimating the impact of issuing an instruction on the usage of ROQ and memory level parallelism (MLP), and incorporate these considerations in the cost model of instruction scheduling. Preliminary evaluation results are presented to demonstrate the effectiveness of our approach on reducing the time of ROQ full and improving performance.