Converting massive TLP to DLP: a special-purpose processor for molecular orbital computations

Authors:
Tirath Ramdas;Gregory K. Egan;David Abramson;Kim Baldridge
Affiliations:
Monash University, Melbourne, Australia;Monash University, Melbourne, Australia;Monash University, Melbourne, Australia;University of Zurich, Zurich, Switzerland
Venue:
Proceedings of the 4th international conference on Computing frontiers
Year:
2007

Citing 11
Cited 1

Concise, open-ended implementation of Rys polynomial evaluation of two-electron integrals

Journal of Computational Chemistry
Vector processing algorithm for electron repulsion integrals in ab initio HF calculation based upon the PK supermatrix

Journal of Computational Chemistry
MOE: a special-purpose parallel computer for high-speed, large-scale molecular orbital calculation

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Advanced Computer Architectures

Advanced Computer Architectures
A survey of processors with explicit multithreading

ACM Computing Surveys (CSUR)
A pipelined memory architecture for high throughput network processors

Proceedings of the 30th annual international symposium on Computer architecture
Chip multiprocessing and the cell broadband engine

Proceedings of the 3rd conference on Computing frontiers
Dynamic thread assignment on heterogeneous multiprocessor architectures

Proceedings of the 3rd conference on Computing frontiers
A survey of research and practices of Network-on-chip

ACM Computing Surveys (CSUR)
The potential energy efficiency of vector acceleration

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Overview of the QCDSP and QCDOC computers

IBM Journal of Research and Development

Hardware implementation of the exponent based computational core for an exchange-correlation potential matrix generation

PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose an application specific processor for computational quantum chemistry. The kernel of interest is the computation of electron repulsion integrals (ERIs), which vary in control flow with different input data. This lack of uniformity limits the level of data-level parallelism (DLP) inherent in the application, thus apparently rendering a SIMD architecture unfeasible. All ERIs may be computed in parallel, therefore there is much thread-level parallelism (TLP). We observe that it is possible to match threads with certain characteristics in a manner that reveals significant DLP across multiple threads. Our thread matching and scheduling scheme effectively converts TLP to DLP, allowing SIMD processing which was previously unfeasible. We envision that this approach may expose DLP in other applications traditionally considered to be poor candidates for SIMD computation.