Design and evaluation of an auto-memoization processor

  • Authors:
  • Tomoaki Tsumura;Ikuma Suzuki;Yasuki Ikeuchi;Hiroshi Matsuo;Hiroshi Nakashima;Yasuhiko Nakashima

  • Affiliations:
  • Nagoya Inst. of Tech., Showa, Nagoya, Japan;Toyohashi Univ. of Tech, Toyohashi, Aichi, Japan;Toyohashi Univ. of Tech, Toyohashi, Aichi, Japan;Nagoya Inst. of Tech., Gokiso, Showa, Nagoya, Japan;Academic Center for Computing and Media Studies, Kyoto Univ., Yoshida, Sakyo, Kyoto, Japan;Grad. School of Info. Sci., Nara Inst. of Sci. and Tech., Ikoma, Nara, Japan

  • Venue:
  • PDCN'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: parallel and distributed computing and networks
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes the design and evaluation of an auto-memoization processor. The major point of this proposal is to detect the multilevel functions and loops with no additional instructions controlled by the compiler. This general purpose processor detects the functions and loops, and memoizes them automatically and dynamically. Hence, any load modules and binary programs can gain speedup without recompilation or rewriting. We also propose a parallel execution by multiple speculative cores and one main memoing core. While main core executes a memoizable region, speculative cores execute the same region simultaneously. The speculative execution uses predicted inputs. This can omit the execution of instruction regions whose inputs show monotonous increase or decrease, and may effectively use surplus cores in coming many-core era. The result of the experiment with GENEsYs: genetic algorithm programs shows that our auto-memoization processor gains significantly large speedup, up to 7.1-fold and 1.6-fold on average. Another result with SPEC CPU95 suite benchmarks shows that the auto-memoization with three speculative cores achieves up to 2.9-fold speedup for 102.swim and 1.4-fold on average. It also shows that the parallel execution by speculative cores reduces cache misses just like pre-fetching.