Loop Performance Improvement for Min-cut Program Decomposition Method

  • Authors:
  • Kanemitsu Ootsu;Takeshi Abe;Takashi Yokota;Takanobu Baba

  • Affiliations:
  • -;-;-;-

  • Venue:
  • ICNC '10 Proceedings of the 2010 First International Conference on Networking and Computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In recent years, speedup by the thread level parallel processing becomes more and more important with the spread of multi-core processors, and various techniques for parallelizing the single thread code into the efficient multithreaded code that can achieve efficient thread level parallel processing have been developed. The speculative multithreading is an important technology for achieving the high performance by the thread level parallel processing. To improve the execution performance by speculative multithreading, it is necessary to appropriately decompose the program code. Against the background of this, T. A. Johnson, et al. proposed a program decomposition technique (hereafter, we refer to their method as min-cut method) for finding the decomposition pattern that can minimize the effects of the performance degradation factors, by finding the minimum cut set in the weighted control flow graph (CFG) of the program. The min-cut method is wide coverage and a very promising technique since the whole program can be decomposed without being restricted to the logical structures, such as loop. However, the min-cut method has a problem that the loop level parallelism cannot be utilized enough. In this paper, we propose an improved method for the min-cut method to enhance the loop execution performance. Our method is based on the min-cut method and tries to apply the loop unrolling to the loop in the target program codes during the process of the decomposition. We apply our method to the practical program codes selected from the SPEC CINT2000 benchmarks. The results show that the loops, that are not decomposed with the min-cut method, are decomposed, and the possibilities of making use of the loop level parallelism increased. In addition, the results of the performance evaluation by using a cycle-based simulator show that our method can improve the loop execution performance, as compared to the min-cut method.