Parallel solution of Hough transform and convolution problems—a novel multimodal approach
SAC '92 Proceedings of the 1992 ACM/SIGAPP symposium on Applied computing: technological challenges of the 1990's
Randomized Hough transform (RHT): basic mechanisms, algorithms, and computational complexities
CVGIP: Image Understanding
The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Effective Hardware-Based Data Prefetching for High-Performance Processors
IEEE Transactions on Computers
Geometric Primitive Extraction Using a Genetic Algorithm
IEEE Transactions on Pattern Analysis and Machine Intelligence
An Efficient Hough Transform Algorithm on SIMD Hypercube
Proceedings of the 1994 International Conference on Parallel and Distributed Systems
Geometric Primitive Extraction Using Tabu Search
ICPR '96 Proceedings of the 13th International Conference on Pattern Recognition - Volume 2
Proceedings of the 34th annual international symposium on Computer architecture
Resource-efficient FPGA architecture and implementation of hough transform
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Hi-index | 0.00 |
This paper presents a parallelization framework for emerging applications on the future chip multiprocessors (CMPs). With the continuing prevalence of CMP and the number of on-die cores increasing steadily for the foreseeable future, one key issue in harnessing the computation power of such a CMP is how to effectively manage and execute many threads at the same time. Hence, we study a parallelization framework, which includes (1) coarse-grain and fine-grain multi-threading, (2) performance analysis, and (3) algorithms changes. In particular, this paper shows how the Hough Transform can be parallelized, as an example. Starting with a sports soccer analysis workload that heavily uses Hough Transform to detect lines in sports soccer field, we extract the coarse-grain data-level parallelism and examine its scaling performance on an 8-core symmetric multiprocessor machine. After realizing the parallel performance limiting factors, we target to exploit the fine-grain data-level parallelism and evaluate its speedup on the 8-core machine and a simulated 64-core CMP. Due to parallel overhead and demanding memory requirements, this fine-grain parallelization doesn't contribute significant performance improvement. After that, we propose a new Hough Transform, and parallelize it in a fine-grain way. Experimental data shows that the new Hough Transform exposes a significant amount of concurrency and pretty good data locality. On the simulated 64-core CMP, we achieve parallel scaling of 61.7x, enabling real-time Hough Transform.