GRAPE-DR: 2-Pflops massively-parallel computer with 512-core, 512-Gflops processor chips for scientific computing

Authors:
Junichiro Makino;Kei Hiraki;Mary Inaba
Affiliations:
National Astronomical Observatory of Japan, Tokyo, Japan;The University of Tokyo, Tokyo, Japan;The University of Tokyo, Tokyo, Japan
Venue:
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Year:
2007

Citing 3
Cited 10

A fast algorithm for particle simulations

Journal of Computational Physics
The connection machine

The connection machine
The Massively Parallel Processor

The Massively Parallel Processor

MCAMP: communication optimization on massively parallel machines with hierarchical scratch-pad memory

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Reevaluating Amdahl's law in the multicore era

Journal of Parallel and Distributed Computing
Mapping scientific applications on a large-scale data-path accelerator implemented by single-flux quantum (SFQ) circuits

Proceedings of the Conference on Design, Automation and Test in Europe
Improving scratchpad allocation with demand-driven data tiling

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
An analytical network performance model for SIMD processor CSX600 interconnects

Journal of Systems Architecture: the EUROMICRO Journal
Implementation and evaluation of an arithmetic pipeline on FLOPS-2D: multi-FPGA system

ACM SIGARCH Computer Architecture News
Optimizing modulo scheduling to achieve reuse and concurrency for stream processors

The Journal of Supercomputing
Comparability Graph Coloring for Optimizing Utilization of Software-Managed Stream Register Files for Stream Processors

ACM Transactions on Architecture and Code Optimization (TACO)
A high efficient on-chip interconnection network in SIMD CMPs

ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
GRAPE-8: an accelerator for gravitational N-body simulation with 20.5Gflops/W performance

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe the GRAPE-DR (Greatly Reduced Array of Processor Elements with Data Reduction) system, which will consist of 4096 processor chips each with 512 cores operating at the clock frequency of 500 MHz. The peak speed of a processor chip is 512Gflops (single precision) or 256 Gflops (double precision). The GRAPE-DR chip works as an attached processor to standard PCs. Currently, a PCI-X board with single GRAPE-DR chip is in operation. We are developing a 4-chip board with PCI-Express interface, which will have the peak performance of 1 Tflops. The final system will be a cluster of 512 PCs each with two GRAPE-DR boards. We plan to complete the final system by early 2009. The application area of GRAPE-DR covers particle-based simulations such as astrophysical many-body simulations and molecular-dynamics simulations, quantum chemistry calculations, various applications which requires dense matrix operations, and many other compute-intensive applications.