Parallel JPEG2000 Image Coding on Multiprocessors
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
System-on-chip beyond the nanometer wall
Proceedings of the 40th annual Design Automation Conference
NoC Synthesis Flow for Customized Domain Specific Multiprocessor Systems-on-Chip
IEEE Transactions on Parallel and Distributed Systems
Automatic thread distribution for nested parallelism in OpenMP
Proceedings of the 19th annual international conference on Supercomputing
Load balancing and OpenMP implementation of nested parallelism
Parallel Computing - OpenMp
Hybrid image classification and parameter selection using a shared memory parallel algorithm
Computers & Geosciences
Nested parallelization with OpenMP
International Journal of Parallel Programming
Runtime adjustment of parallel nested loops
WOMPAT'04 Proceedings of the 5th international conference on OpenMP Applications and Tools: shared Memory Parallel Programming with OpenMP
Hi-index | 0.00 |
The recent trends in processor architecture show that parallel processing is moving into new areas of computing in the form of many-core desktop processors and multi-processor system-on-chips. This means that parallel processing is required in application areas that traditionally have not used parallel programs. This paper investigates parallelism and scalability of an embedded image processing application. The major challenges faced when parallelizing the application were to extract enough parallelism from the application and to reduce load imbalance. The application has limited immediately available parallelism and further extraction of parallelism is limited by small data sets and a relatively high parallelization overhead. Load balance is difficult to obtain due to the limited parallelism and made worse by non-uniform memory latency. Three parallel OpenMP implementations of the application are discussed and evaluated. We show that with some modifications relative speedups in excess of 9 on a 16 CPU system can be reached.