Concise, open-ended implementation of Rys polynomial evaluation of two-electron integrals
Journal of Computational Chemistry
Journal of Computational Chemistry
MOE: a special-purpose parallel computer for high-speed, large-scale molecular orbital calculation
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Advanced Computer Architectures
Advanced Computer Architectures
A survey of processors with explicit multithreading
ACM Computing Surveys (CSUR)
A pipelined memory architecture for high throughput network processors
Proceedings of the 30th annual international symposium on Computer architecture
Chip multiprocessing and the cell broadband engine
Proceedings of the 3rd conference on Computing frontiers
Dynamic thread assignment on heterogeneous multiprocessor architectures
Proceedings of the 3rd conference on Computing frontiers
A survey of research and practices of Network-on-chip
ACM Computing Surveys (CSUR)
The potential energy efficiency of vector acceleration
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Overview of the QCDSP and QCDOC computers
IBM Journal of Research and Development
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Hi-index | 0.00 |
We propose an application specific processor for computational quantum chemistry. The kernel of interest is the computation of electron repulsion integrals (ERIs), which vary in control flow with different input data. This lack of uniformity limits the level of data-level parallelism (DLP) inherent in the application, thus apparently rendering a SIMD architecture unfeasible. All ERIs may be computed in parallel, therefore there is much thread-level parallelism (TLP). We observe that it is possible to match threads with certain characteristics in a manner that reveals significant DLP across multiple threads. Our thread matching and scheduling scheme effectively converts TLP to DLP, allowing SIMD processing which was previously unfeasible. We envision that this approach may expose DLP in other applications traditionally considered to be poor candidates for SIMD computation.