Parallelization, performance analysis, and algorithm consideration of Hough transform on chip multiprocessors

Authors:
Wenlong Li;Yen-Kuang Chen
Affiliations:
Intel Corporation;Intel Corporation
Venue:
ACM SIGARCH Computer Architecture News
Year:
2008

Citing 8
Cited 1

Parallel solution of Hough transform and convolution problems—a novel multimodal approach

SAC '92 Proceedings of the 1992 ACM/SIGAPP symposium on Applied computing: technological challenges of the 1990's
Randomized Hough transform (RHT): basic mechanisms, algorithms, and computational complexities

CVGIP: Image Understanding
The implementation of the Cilk-5 multithreaded language

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Effective Hardware-Based Data Prefetching for High-Performance Processors

IEEE Transactions on Computers
Geometric Primitive Extraction Using a Genetic Algorithm

IEEE Transactions on Pattern Analysis and Machine Intelligence
An Efficient Hough Transform Algorithm on SIMD Hypercube

Proceedings of the 1994 International Conference on Parallel and Distributed Systems
Geometric Primitive Extraction Using Tabu Search

ICPR '96 Proceedings of the 13th International Conference on Pattern Recognition - Volume 2
Physical simulation for animation and visual effects: parallelization and characterization for chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture

Resource-efficient FPGA architecture and implementation of hough transform

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a parallelization framework for emerging applications on the future chip multiprocessors (CMPs). With the continuing prevalence of CMP and the number of on-die cores increasing steadily for the foreseeable future, one key issue in harnessing the computation power of such a CMP is how to effectively manage and execute many threads at the same time. Hence, we study a parallelization framework, which includes (1) coarse-grain and fine-grain multi-threading, (2) performance analysis, and (3) algorithms changes. In particular, this paper shows how the Hough Transform can be parallelized, as an example. Starting with a sports soccer analysis workload that heavily uses Hough Transform to detect lines in sports soccer field, we extract the coarse-grain data-level parallelism and examine its scaling performance on an 8-core symmetric multiprocessor machine. After realizing the parallel performance limiting factors, we target to exploit the fine-grain data-level parallelism and evaluate its speedup on the 8-core machine and a simulated 64-core CMP. Due to parallel overhead and demanding memory requirements, this fine-grain parallelization doesn't contribute significant performance improvement. After that, we propose a new Hough Transform, and parallelize it in a fine-grain way. Experimental data shows that the new Hough Transform exposes a significant amount of concurrency and pretty good data locality. On the simulated 64-core CMP, we achieve parallel scaling of 61.7x, enabling real-time Hough Transform.