Parallelization, performance analysis, and algorithm consideration of Hough transform on chip multiprocessors

  • Authors:
  • Wenlong Li;Yen-Kuang Chen

  • Affiliations:
  • Intel Corporation;Intel Corporation

  • Venue:
  • ACM SIGARCH Computer Architecture News
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a parallelization framework for emerging applications on the future chip multiprocessors (CMPs). With the continuing prevalence of CMP and the number of on-die cores increasing steadily for the foreseeable future, one key issue in harnessing the computation power of such a CMP is how to effectively manage and execute many threads at the same time. Hence, we study a parallelization framework, which includes (1) coarse-grain and fine-grain multi-threading, (2) performance analysis, and (3) algorithms changes. In particular, this paper shows how the Hough Transform can be parallelized, as an example. Starting with a sports soccer analysis workload that heavily uses Hough Transform to detect lines in sports soccer field, we extract the coarse-grain data-level parallelism and examine its scaling performance on an 8-core symmetric multiprocessor machine. After realizing the parallel performance limiting factors, we target to exploit the fine-grain data-level parallelism and evaluate its speedup on the 8-core machine and a simulated 64-core CMP. Due to parallel overhead and demanding memory requirements, this fine-grain parallelization doesn't contribute significant performance improvement. After that, we propose a new Hough Transform, and parallelize it in a fine-grain way. Experimental data shows that the new Hough Transform exposes a significant amount of concurrency and pretty good data locality. On the simulated 64-core CMP, we achieve parallel scaling of 61.7x, enabling real-time Hough Transform.