A new finite element formulation for computational fluid dynamics: II. Beyond SUPG
Computer Methods in Applied Mechanics and Engineering
Finite element analysis of the compressible Euler and Navier-Stokes equations
Finite element analysis of the compressible Euler and Navier-Stokes equations
Iterative solution methods
Domain decomposition: parallel multilevel methods for elliptic partial differential equations
Domain decomposition: parallel multilevel methods for elliptic partial differential equations
Efficient parallel computation of unstructured finite element reacting flow solutions
Parallel Computing - Special issue on applications: parallel computing methods in applied fluid mechanics
A multigrid tutorial: second edition
A multigrid tutorial: second edition
Parallel multilevel k-way partitioning scheme for irregular graphs
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Multigrid
Semiconductor Devices: A Simulation Approach with CDROM
Semiconductor Devices: A Simulation Approach with CDROM
Iterative Methods for Sparse Linear Systems
Iterative Methods for Sparse Linear Systems
Jacobian-free Newton-Krylov methods: a survey of approaches and applications
Journal of Computational Physics
Journal of Computational Physics
An Improved Convergence Bound for Aggregation-Based Domain Decomposition Preconditioners
SIAM Journal on Matrix Analysis and Applications
Direct Methods for Sparse Linear Systems (Fundamentals of Algorithms 2)
Direct Methods for Sparse Linear Systems (Fundamentals of Algorithms 2)
A New Petrov-Galerkin Smoothed Aggregation Preconditioner for Nonsymmetric Linear Systems
SIAM Journal on Scientific Computing
Journal of Computational Physics
A Light-weight API for Portable Multicore Programming
PDP '10 Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing
Towards a scalable fully-implicit fully-coupled resistive MHD formulation with stabilized FE methods
Journal of Computational Physics
Semiconductor device simulation using adaptive refinement and flux upwinding
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
The Red Storm Architecture and Early Experiences with Multi-Core Processors
International Journal of Distributed Systems and Technologies
Poster: mini-applications: vehicles for co-design
Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion
Concurrency and Computation: Practice & Experience
Parallel 3D-TLM algorithm for simulation of the Earth-ionosphere cavity
Journal of Computational Physics
Exascale design space exploration and co-design
Future Generation Computer Systems
Hi-index | 31.45 |
This preliminary study considers the scaling and performance of a finite element (FE) semiconductor device simulator on a set of multi-socket, multicore architectures with nonuniform memory access (NUMA) compute nodes. These multicore architectures include two linux clusters with multicore processors: a quad-socket, quad-core AMD Opteron platform and a dual-socket, quad-core Intel Xeon Nehalem platform; and a dual-socket, six-core AMD Opteron workstation. These platforms have complex memory hierarchies that include local core-based cache, local socket-based memory, access to memory on the same mainboard from another socket, and then memory across network links to different nodes. The specific semiconductor device simulator used in this study employs a fully-coupled Newton-Krylov solver with domain decomposition and multilevel preconditioners. Scaling results presented include a large-scale problem of 100+ million unknowns on 4096 cores and a comparison with the Cray XT3/4 Red Storm capability platform. Although the MPI-only device simulator employed for this work can take advantage of all the cores of quad-core and six-core CPUs, the efficiency of the linear system solve is decreasing with increased core count and eventually a different programming paradigm will be needed.