LAPACK Users' guide (third ed.)
LAPACK Users' guide (third ed.)
SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems
ACM Transactions on Mathematical Software (TOMS)
A column pre-ordering strategy for the unsymmetric-pattern multifrontal method
ACM Transactions on Mathematical Software (TOMS)
Solving unsymmetric sparse systems of linear equations with PARDISO
Future Generation Computer Systems - Special issue: Selected numerical algorithms
Silicon CMOS devices beyond scaling
IBM Journal of Research and Development - Advanced silicon technology
A multi-level parallel simulation approach to electron transport in nano-scale transistors
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A Parallel Sparse Linear Solver for Nearest-Neighbor Tight-Binding Problems
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Fundamentals of III-V Semiconductor MOSFETs
Fundamentals of III-V Semiconductor MOSFETs
Atomistic nanoelectronic device engineering with sustained performances up to 1.44 PFlop/s
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
A quantum transport approach based on the Non-equilibrium Green's Function formalism and the tight-binding method has been developed to investigate the performances of atomistically resolved nanoelectronic devices in the presence of electron-phonon scattering. The model is integrated into a quad-level parallel environment (bias, momentum, energy, and spatial domain decomposition) that scales almost perfectly up to 220k cores in the ballistic limit of electron transport. In this case, the momentum and energy points form a quasi-embarrassingly parallel problem. The novelty in this paper is the inclusion of scattering self-energies that couple all the momenta and several energies together, requiring substantial inter-processor communication. An efficient parallel implementation of electron-phonon scattering is therefore proposed and applied to a realistically extended transistor structure. A good scaling of the simulation walltime up to 95,256 cores and a sustained performance of 142 TFlop/s are reported on the Cray-XT5 Jaguar.