Queue Usage and Memory-Level Parallelism Sensitive Scheduling

Authors:
Liu Zhanglin;Feng Xiaobing;Zhang Zhaoqing
Affiliations:
Institute of computing technology of Chinese Academy of Science;Institute of computing technology of Chinese Academy of Science;Institute of computing technology of Chinese Academy of Science
Venue:
HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Year:
2005

Citing 9
Cited 0

Global instruction scheduling for superscalar machines

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Balanced scheduling: instruction scheduling when memory latency is uncertain

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Hitting the memory wall: implications of the obvious

ACM SIGARCH Computer Architecture News
Code transformations to improve memory parallelism

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Enhancing memory level parallelism via recovery-free value prediction

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
A New Framework for Integrated Global Local Scheduling

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Microarchitecture Optimizations for Exploiting Memory-Level Parallelism

Proceedings of the 31st annual international symposium on Computer architecture
Improving Load/Store Queues Usage in Scientific Computing

ICPP '04 Proceedings of the 2004 International Conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In out-of-order (OOO) processors, Reorder queue (ROQ) has been widely used to implement precise interruption. The full of ROQ will cause the whole processor stall, while a long latency operation, e.g. a load missed in the caches, will almost definitely cause the ROQ full. In this paper we present a model for estimating the impact of issuing an instruction on the usage of ROQ and memory level parallelism (MLP), and incorporate these considerations in the cost model of instruction scheduling. Preliminary evaluation results are presented to demonstrate the effectiveness of our approach on reducing the time of ROQ full and improving performance.