An analysis of loop latency in dataflow execution

  • Authors:
  • Walid A. Najjar;W. Marcus Miller;A. P. Wim Böhm

  • Affiliations:
  • -;-;-

  • Venue:
  • ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
  • Year:
  • 1992

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent evidence indicates that the exploitation of locality in dataflow programs could have a dramatic impact on performance. The current trend in the design of dataflow processors suggest a synthesis of traditional non-strict fine grain instruction execution and a strict coarse grain execution in order to exploit locality. While an increase in instruction granularity will favor the exploitation of locality within a single execution thread, the resulting grain size may increase latency among execution threads. In this paper, the resulting latency incurred through the partitioning of fine grain instructions to quantify coarse grain input and output latencies using a set of numeric benchmarks. The results offer compelling evidence that the inner loops of a significant number of numeric codes would benefit from coarse grain execution. Based on cluster execution times, more than 60% of the measured benchmarks favor a coarse grain execution. IN 64% of the cases the input latency to the cluster is the same in coarse or fine grain execution modes. The results suggest that the effects of increased instruction granularity on latency is minimal for a high percentage of the measured codes, and in large part is offset by available intra-thread locality. Furthermore, simulation results indicate that strict or non-strict data structure access does not change the basic cluster characteristics.