Numerical solution of the eigenvalue problem for symmetric rationally generated Toeplitz matrices
SIAM Journal on Matrix Analysis and Applications
Computational frameworks for the fast Fourier transform
Computational frameworks for the fast Fourier transform
A Shifted Block Lanczos Algorithm for Solving Sparse Symmetric Generalized Eigenproblems
SIAM Journal on Matrix Analysis and Applications
Fast reliable algorithms for matrices with structure
Fast reliable algorithms for matrices with structure
Templates for the solution of algebraic eigenvalue problems: a practical guide
Templates for the solution of algebraic eigenvalue problems: a practical guide
OpenMP: An Industry-Standard API for Shared-Memory Programming
IEEE Computational Science & Engineering
MPI: A Message-Passing Interface Standard
MPI: A Message-Passing Interface Standard
High-performance algorithms to solve toeplitz and block toeplitz matrices
High-performance algorithms to solve toeplitz and block toeplitz matrices
SIPs: Shift-and-invert parallel spectral transformations
ACM Transactions on Mathematical Software (TOMS)
A multilevel parallel algorithm to solve symmetric Toeplitz linear systems
The Journal of Supercomputing
Parallel computation of the eigenvalues of symmetric Toeplitz matrices through iterative methods
Journal of Parallel and Distributed Computing
ACM Transactions on Mathematical Software (TOMS)
Hi-index | 0.00 |
In a previous paper (Vidal et al., 2008, [21]), we presented a parallel solver for the symmetric Toeplitz eigenvalue problem, which is based on a modified version of the Lanczos iteration. However, its efficient implementation on modern parallel architectures is not trivial. In this paper, we present an efficient implementation on multicore processors which takes advantage of the features of this architecture. Several optimization techniques have been incorporated to the algorithm: improvement of Discrete Sine Transform routines, utilization of the Gohberg-Semencul formulas to solve the Toeplitz linear systems, optimization of the workload distribution among processors, and others. Although the algorithm follows a distributed memory parallel programming paradigm that is led by the nature of the mathematical derivation, special attention has been paid to obtaining the best performance in multicore environments. Hybrid techniques, which merge OpenMP and MPI, have been used to increase the performance in these environments. Experimental results show that our implementation takes advantage of multicore architectures and clearly outperforms the results obtained with LAPACK or ScaLAPACK.