Design and evaluation of an auto-memoization processor

Authors:
Tomoaki Tsumura;Ikuma Suzuki;Yasuki Ikeuchi;Hiroshi Matsuo;Hiroshi Nakashima;Yasuhiko Nakashima
Affiliations:
Nagoya Inst. of Tech., Showa, Nagoya, Japan;Toyohashi Univ. of Tech, Toyohashi, Aichi, Japan;Toyohashi Univ. of Tech, Toyohashi, Aichi, Japan;Nagoya Inst. of Tech., Gokiso, Showa, Nagoya, Japan;Academic Center for Computing and Media Studies, Kyoto Univ., Yoshida, Sakyo, Kyoto, Japan;Grad. School of Info. Sci., Nara Inst. of Sci. and Tech., Ikoma, Nara, Japan
Venue:
PDCN'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: parallel and distributed computing and networks
Year:
2007

Citing 5
Cited 0

Dynamic instruction reuse

Proceedings of the 24th annual international symposium on Computer architecture
Better exploration of region-level value locality with integrated computation reuse and value prediction

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp

Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp
The Dynamic Trace Memorization Reuse Technique

PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Trace-Level Reuse

ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes the design and evaluation of an auto-memoization processor. The major point of this proposal is to detect the multilevel functions and loops with no additional instructions controlled by the compiler. This general purpose processor detects the functions and loops, and memoizes them automatically and dynamically. Hence, any load modules and binary programs can gain speedup without recompilation or rewriting. We also propose a parallel execution by multiple speculative cores and one main memoing core. While main core executes a memoizable region, speculative cores execute the same region simultaneously. The speculative execution uses predicted inputs. This can omit the execution of instruction regions whose inputs show monotonous increase or decrease, and may effectively use surplus cores in coming many-core era. The result of the experiment with GENEsYs: genetic algorithm programs shows that our auto-memoization processor gains significantly large speedup, up to 7.1-fold and 1.6-fold on average. Another result with SPEC CPU95 suite benchmarks shows that the auto-memoization with three speculative cores achieves up to 2.9-fold speedup for 102.swim and 1.4-fold on average. It also shows that the parallel execution by speculative cores reduces cache misses just like pre-fetching.