Array organization in parallel memories

  • Authors:
  • Mayez Al-Mouhamed

  • Affiliations:
  • Computer Engineering Department, College of Computer Science and Engineering, King Fahd University of Petroleum and Minerals, P.O. Box 787, Dhahran 31261, Saudi Arabia

  • Venue:
  • International Journal of Parallel Programming
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The bandwidth mismatch between processor and main memory is one major throughput limiting problem. Although streamed computations have predictable access patterns their references have little temporal locality and are generally too long to cache. A memory and compiler co-optimization aimed at reducing low-level memory accesses using software and hardware locality optimizations is presented. We propose a scalable and predictable parallel memory based on a compiler synthesis of storage schemes for multi-dimensional arrays that are accessed by an arbitrary but known set of data access patterns. Using algebra of non-singular Boolean matrices, we present analysis of conflict-free access to (1) parallel memories, and (2) alignment networks. Finding a multi-pattern storage scheme is one NP-complete problem. An effective compiler heuristic is proposed for finding a storage matrix that minimizes overall memory access time. This applies to arbitrary linear patterns and arbitrary alignment networks. It is shown that the proposed storage scheme finds an optimal storage scheme for parallel (1) FFT, and (2) bitonic sorting. The proposed storage scheme outperforms statically optimized storages in the case of power-of-2 multi-stride access. The case of non power-of-2 strides is also addressed. The performance and scalability of the proposed parallel memory and its predictable access time are presented using numerical and multimedia algorithms. It is shown that a memory utilization above 83% is achieved by our storage scheme for 64 memories, which largely outperforms previous proposals. Our approach provides a tool for matching the storage pattern with the data access patterns needed for embedded systems running streamed computations with predictable data access patterns.