Profile-driven instruction level parallel scheduling with application to super blocks

  • Authors:
  • C. Chekuri;R. Johnson;R. Motwani;B. Natarajan;B. R. Rau;M. Schlansker

  • Affiliations:
  • Dept. of Comp. Sci., Stanford Univ., Stanford, CA;Hewlett Packard Labs, 1501 Page Mill Rd, Palo Alto, CA;Dept. of Comp. Sci., Stanford Univ., Stanford, CA;Hewlett Packard Labs, 1501 Page Mill Rd, Palo Alto, CA;Hewlett Packard Labs, 1501 Page Mill Rd, Palo Alto, CA;Hewlett Packard Labs, 1501 Page Mill Rd, Palo Alto, CA

  • Venue:
  • Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
  • Year:
  • 1996

Quantified Score

Hi-index 0.00

Visualization

Abstract

Code scheduling to exploit instruction level parallelism (ILP) is a critical problem in compiler optimization research in light of the increased use of long-instruction-word machines. Unfortunately optimum scheduling is computationally intractable, and one must resort to carefully crafted heuristics in practice. If the scope of application of a scheduling heuristic is limited to basic blocks, considerable performance loss may be incurred at block boundaries. To overcome this obstacle, basic blocks can be coalesced across branches to form larger regions such as super blocks. In the literature, these regions are typically scheduled using algorithms that are either oblivious to profile information (under the assumption that the process of forming the region has fully utilized the profile information), or use the profile information as an addendum to classical scheduling techniques. We believe that even for the simple case of linear code regions such as super blocks, additional performance improvement can be gained by utilizing the profile information in scheduling as well. We propose a general paradigm for converting any profile-insensitive list scheduler to a profile-sensitive scheduler. Our technique is developed via a theoretical analysis of a simplified abstract model of the general problem of profile-driven scheduling over any acyclic code region, yielding a scoring measure for ranking branch instructions.