An efficient in-place 3D transpose for multicore processors with software managed memory hierarchy

  • Authors:
  • Ali El-Moursy;Ahmed El-Mahdy;Hisham El-Shishiny

  • Affiliations:
  • Electronics Research Institute, Giza, Egypt;Alexandria University, Alexandria, Egypt;IBM Centre for Advanced Studies in Cairo, IBM WTC, El-Ahram, Giza, Egypt

  • Venue:
  • IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

3D transpose is an important operation in many large scale scientific applications such as seismic and medical imaging. This paper proposes a novel algorithm for fast in-place 3D transpose operation. The algorithm exploits Single Instruction Multiple Data (SIMD) multicore architecture with software managed memory hierarchy. Such architectural features are present in the next generation processors, such as the Cell Broadband Engine (Cell BE) processor. The algorithm performs transposition at two levels of granularity: at coarse level, where logical transposition is done by merely transposing the address map at each access; and at a fine grain level, where physical transposition is done by actual element displacement/swapping. Such mix combines the benefits of allowing for fast on-chip bandwidth by providing for large transfer sizes, and at the same time allows for fine-grain SIMD operations. The transfer rate is further enhanced by allowing for batch transposing spatially joined data along a major axis. Results on the Cell BE processor show substantial utilisation of on-chip communication bandwidth, and negligible processing time.