The mathematics of nonlinear programming
The mathematics of nonlinear programming
Generating optimal topologies in structural design using a homogenization method
Computer Methods in Applied Mechanics and Engineering
The greedy coloring is a bad probabilistic algorithm
Journal of Algorithms
LAPACK's user's guide
Parallelization of a Dynamic Unstructured Algorithm Using Three Leading Programming Paradigms
IEEE Transactions on Parallel and Distributed Systems
A Comparison of Several Bandwidth and Profile Reduction Algorithms
ACM Transactions on Mathematical Software (TOMS)
An updated set of basic linear algebra subprograms (BLAS)
ACM Transactions on Mathematical Software (TOMS)
Reducing the bandwidth of sparse symmetric matrices
ACM '69 Proceedings of the 1969 24th national conference
Benchmarking GPUs to tune dense linear algebra
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Cholesky factorization of band matrices using multithreaded BLAS
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
The implementation of BLAS for band matrices
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
High-performance direct solution of finite element problems on multi-core processors
High-performance direct solution of finite element problems on multi-core processors
A 99 line topology optimization code written in Matlab
Structural and Multidisciplinary Optimization
Hi-index | 0.00 |
The present work investigates the feasibility of finite element methods and topology optimization for unstructured meshes in massively parallel computer architectures, more specifically on Graphics Processing Units or GPUs. Challenges in the parallel implementation, like the parallel assembly race condition, are discussed and solved with simple algorithms, in this case greedy graph coloring. The parallel implementation for every step involved in the topology optimization process is benchmarked and compared against an equivalent sequential implementation. The ultimate goal of this work is to speed up the topology optimization process by means of parallel computing using off-the-shelf hardware. Examples are compared with both a standard sequential version of the implementation and a massively parallel version to better illustrate the advantages and disadvantages of this approach.