QCDSP machines: design, performance and cost

  • Authors:
  • Dong Chen;Ping Chen;Norman H. Christ;Robert G. Edwards;George Fleming;Alan Gara;Sten Hansen;Chulwoo Jung;Adrian Kahler;Stephen Kasow;Anthony D. Kennedy;Greg Kilcup;Yubing Luo;Catalin Malureanu;Robert D. Mawhinney;John Parsons;ChengZhong Sui;Pavlos Vranas;Yuri Zhestkov

  • Affiliations:
  • Massachusetts Institute of Technology;Columbia University;Columbia University;Florida State University;Columbia University;Columbia University Nevis Labs;Fermilab National Acceleration Lab;Columbia University;Columbia University;Columbia University;Florida State University;Ohio State University;Columbia University;Columbia University;Columbia University;Columbia University Nevis Labs;Columbia University;Columbia University;Columbia University

  • Venue:
  • SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Quantum Chromodynamics on Digital Signal Processors ( QCDSP) machines in operation at Columbia University and nearly complete at the RIKEN Brookhaven Research Center are MIMD machines with processing nodes based on the Texas Instruments TMS320C31-50 digital signal processor (DSP), interconnected as a four-dimensional torus. The Columbia machine contains 8,192 nodes and has a peak speed of 0.4T flops. The RIKEN/BNL machine has 12,288 nodes, a peak speed of 0.6 Tflops and a total cost of $1.85M. In order to establish a cost/performance figure for this architecture, we have run two standard lattice quantum chromodynamics (QCD) programs on portions of this hardware. The first program stochastically estimates the trace of the inverse of the Wilson Dirac operator, computed on a series of lattice configurations. Running for 49 minutes on 1/6 of the Brookhaven machine, this code performs floating point operations for a cost/performance of $13.2/Mflops. We also present the performance of a second production program which generates a Markov chain of lattice configurations distributed according to the statistical weight describing two species of light Wilson quarks interacting with the gauge degrees of freedom of QCD. Running for 1334 minutes on a single cabinet at Columbia (equivalent to 1/12th of the Brookhaven machine), this program executes floating point operations for a cost performance of $13.6/Mflops. Further information about these machines can be found at http://www.phys.columbia.edu/~cqft.