Custom Data Layout for Memory Parallelism

  • Authors:
  • Byoungro So;Mary W. Hall;Heidi E. Ziegler

  • Affiliations:
  • -;-;-

  • Venue:
  • Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we describe a generalized approach toderiving a custom data layout in multiple memory banksfor array-based computations, to facilitate high-bandwidthparallel memory accesses in modern architectures wheremultiple memory banks can simultaneously feed one ormore functional units. We do not use a fixed data layout,but rather select application-specific layouts according toaccess patterns in the code. A unique feature of this approachis its flexibility in the presence of code reorderingtransformations, such as the loop nest transformations commonlyapplied to array-based computations. We have implementedthis algorithm in the DEFACTO system, a designenvironment for automatically mapping C programsto hardware implementations for FPGA-based systems. Wepresent experimental results for five multimedia kernels thatdemonstrate the benefits of this approach. Our results showthat custom data layout yields results as good as, or betterthan, naive or fixed cyclic layouts, and is significantly betterfor certain access patterns and in the presence of codereordering transformations. When used in conjunction withunrolling loops in a nest to expose instruction-level parallelism,we observe greater than a 75% reduction in the numberof memory access cycles and speedups ranging from3.96 to 46.7 for 8 memories, as compared to using a singlememory with no unrolling.