Fast parallel model estimation on the cell broadband engine

  • Authors:
  • Ali Khalili;Amir Fijany;Fouzhan Hosseini;Saeed Safari;Jean-Guy Fontaine

  • Affiliations:
  • Italian Institute of Technology, Genova, Italy;Italian Institute of Technology, Genova, Italy;Italian Institute of Technology, Genova, Italy;Italian Institute of Technology, Genova, Italy and School of Electrical and Computer Engineering, University of Tehran, Iran;Italian Institute of Technology, Genova, Italy

  • Venue:
  • ISVC'10 Proceedings of the 6th international conference on Advances in visual computing - Volume Part II
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present fast parallel implementations of the RANSAC algorithm on the Cell processor, a multicore SIMD architecture. We present our developed strategies for efficient parallel implementation of the RANSAC algorithm by exploiting the specific features of the Cell processor. We also discuss our new method for model generation to increase the efficiency of calculation of the Homography transformation by RANSAC. In fact, by using this new method and change of algorithm, we have been able to increase the overall performance by a factor of almost 3. We also discuss in details our approaches for further increasing the efficiency by a careful vectorization of the computation as well as by reducing the communication overhead by overlapping computation and communication. The results of our practical implementations clearly demonstrate that a very high sustained computational performance (in terms of sustained GFLOPS) can be achieved with a minimum of communication overhead, resulting in a capability of real-time generation and evaluation of a very large number of models. With a date set of size 2048 data and a number of 256 models, we have achieved the performance of over 80 sustained GFLOPS. Since the peak computing power of our target architecture is 179 GFLOPS, this represents a sustained performance of about 44% of the peak power, indicating the efficiency of our algorithms and implementations. Our results clearly demonstrate the advantages of parallel implementation of RANSAC on MIMD-SIMD architectures such as Cell processor. They also prove that, by using such a parallel implementation over the sequential one, a problem with a fixed number of iterations (hypothetical models) can be solved much faster leading to a potentially better accuracy of the model.