Microwave tomography for breast cancer detection on Cell broadband engine processors

  • Authors:
  • Meilian Xu;Parimala Thulasiraman;Sima Noghanian

  • Affiliations:
  • Department of Computer Science, University of Manitoba, Winnipeg, Manitoba, Canada;Department of Computer Science, University of Manitoba, Winnipeg, Manitoba, Canada;Department of Electrical Engineering, University of North Dakota, USA

  • Venue:
  • Journal of Parallel and Distributed Computing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Microwave tomography (MT) is a safe screening modality that can be used for breast cancer detection. The technique uses the dielectric property contrasts between different breast tissues at microwave frequencies to determine the existence of abnormalities. Our proposed MT approach is an iterative process that involves two algorithms: Finite-Difference Time-Domain (FDTD) and Genetic Algorithm (GA). It is a compute intensive problem: (i) the number of iterations can be quite large to detect small tumors; (ii) many fine-grained computations and discretizations of the object under screening are required for accuracy. In our earlier work, we developed a parallel algorithm for microwave tomography on CPU-based homogeneous, multi-core, distributed memory machines. The performance improvement was limited due to communication and synchronization latencies inherent in the algorithm. In this paper, we exploit the parallelism of microwave tomography on the Cell BE processor. Since FDTD is a numerical technique with regular memory accesses, intensive floating point operations and SIMD type operations, the algorithm can be efficiently mapped on the Cell processor achieving significant performance. The initial implementation of FDTD on Cell BE with 8 SPEs is 2.9 times faster than an eight node shared memory machine and 1.45 times faster than an eight node distributed memory machine. In this work, we modify the FDTD algorithm by overlapping computations with communications during asynchronous DMA transfers. The modified algorithm also orchestrates the computations to fully use data between DMA transfers to increase the computation-to-communication ratio. We see 54% improvement on 8 SPEs (27.9% on 1 SPE) for the modified FDTD in comparison to our original FDTD algorithm on Cell BE. We further reduce the synchronization latency between GA and FDTD by using mechanisms such as double buffering. We also propose a performance prediction model based on DMA transfers, number of instructions and operations, the processor frequency and DMA bandwidth. We show that the execution time from our prediction model is comparable (within 0.5 s difference) with the execution time of the experimental results on one SPE.