Iterative Algorithms for Solution of Large Sparse Systems of Linear Equations on Hypercubes
IEEE Transactions on Computers
Parallel and distributed computation: numerical methods
Parallel and distributed computation: numerical methods
Techniques for parallel manipulation of sparse matrices
Theoretical Computer Science - Special issue on high performance computer systems
On minimizing data sharing overhead for large-scale data-parallel algorithms: replication and allocation of shared data
Cost-effective medical image reconstruction: from clusters to graphics processing units
Proceedings of the 5th conference on Computing frontiers
Parallelism of iterative CT reconstruction based on local reconstruction algorithm
The Journal of Supercomputing
Hi-index | 0.00 |
The expectation maximization (EM) algorithm is one of the most suitable iterative methods for positron emission tomography (PET) image reconstruction; however, it requires a long computation time and an enormous amount of memory space. To overcome these problems, we present two classes of highly efficient parallelization schemes: homogeneous and inhomogeneous partitionings. The essential difference between these two classes is that the inhomogeneous partitioning schemes may partially overlap the communication with computation by deliberate exploitation of the inherent data access pattern with a multiple-ring communication pattern. In theory, the inhomogeneous partitioning schemes may outperform the homogeneous partitioning schemes. However, the latter require a simpler communication pattern. In an attempt to estimate the achievable performance and to analyze the performance degradation factors without actual implementation, we have derived efficiency prediction formulas for closely estimating the performance for the proposed parallelization schemes. We propose new integration and broadcasting algorithms for hypercube, ring, and n-D mesh topologies, which are more efficient than the conventional algorithms when the link setup time is relatively negligible. The concept of the proposed task and data partitioning schemes, the integration and broadcasting algorithms, and the efficiency estimation methods can be applied to many other problems that are rich in data parallelism, but without balanced exclusive partitioning.