Programming at the processor-memory-switch level
ICSE '88 Proceedings of the 10th international conference on Software engineering
Image selective smoothing and edge detection by nonlinear diffusion. II
SIAM Journal on Numerical Analysis
BoomerAMG: a parallel algebraic multigrid solver and preconditioner
Applied Numerical Mathematics - Developments and trends in iterative methods for large systems of equations—in memoriam Rüdiger Weiss
A Formal Specification of an Oscilloscope
IEEE Software
Pattern-Oriented Software Architecture: A Pattern Language for Distributed Computing
Pattern-Oriented Software Architecture: A Pattern Language for Distributed Computing
Journal of Computational and Applied Mathematics - Special issue: Applied computational inverse problems
Integrated Computer-Aided Engineering
Software Pipelines and SOA: Releasing the Power of Multi-Core Processing
Software Pipelines and SOA: Releasing the Power of Multi-Core Processing
Integrated Computer-Aided Engineering
Integrated Computer-Aided Engineering
Software designs of image processing tasks with incremental refinement of computation
IEEE Transactions on Image Processing
Integrated Computer-Aided Engineering
Efficient and reliable schemes for nonlinear diffusion filtering
IEEE Transactions on Image Processing
A generalization of quad-trees applied to image coding
Integrated Computer-Aided Engineering
An efficient GPU implementation of fixed-complexity sphere decoders for MIMO wireless systems
Integrated Computer-Aided Engineering
Integrated Computer-Aided Engineering
Integrated Computer-Aided Engineering
Multiresolution streamline placement based on control grids
Integrated Computer-Aided Engineering
Hi-index | 0.00 |
Real time image sequences analysis is a challenge. Using high performance computing technologies, a parallel algorithm for performing data sequence analysis is proposed. We call it pipelined algorithm PA. The idea underlying the design of PA comes from the Pipes and Filters design approach: to partition the sequence into ordered subsets and to overlap tasks execution via pipelining. Moreover, in order to improve the performance gain of the PA algorithm, tasks' execution is distributed among multicore processors. The approach chosen for introducing concurrency takes into account the hierarchical parallelism of system architecture of multicore multiprocessors. More precisely, three parallelization strategies of PA are considered: first strategy distributes the execution of each task among the same number of cores employing a fine-grained task parallelism we call it inter-task data parallelism, second strategy refers to the execution of each task to one core introducing concurrency at a coarser level we call it intra-task functional parallelism, and the last one combines the previous two approaches: it refers to the mapping of each task to a group of cores intra-task functional parallelism distributing task's execution within each group inter-task data parallelism. We prove, both theoretically and experimentally that the third strategy is more effective than the others in terms of speed up improvement as the data length increases. As testbed the segmentation of ultrasound sequences is considered. Experiments on real data are carried out using a multicore-based parallel computer system relying on PETSc Portable Extensible Toolkit for Scientific computation, a high level software computing environment.