A fast algorithm for particle simulations
Journal of Computational Physics
The connection machine
The Massively Parallel Processor
The Massively Parallel Processor
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Reevaluating Amdahl's law in the multicore era
Journal of Parallel and Distributed Computing
Proceedings of the Conference on Design, Automation and Test in Europe
Improving scratchpad allocation with demand-driven data tiling
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
An analytical network performance model for SIMD processor CSX600 interconnects
Journal of Systems Architecture: the EUROMICRO Journal
Implementation and evaluation of an arithmetic pipeline on FLOPS-2D: multi-FPGA system
ACM SIGARCH Computer Architecture News
Optimizing modulo scheduling to achieve reuse and concurrency for stream processors
The Journal of Supercomputing
ACM Transactions on Architecture and Code Optimization (TACO)
A high efficient on-chip interconnection network in SIMD CMPs
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
GRAPE-8: an accelerator for gravitational N-body simulation with 20.5Gflops/W performance
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
We describe the GRAPE-DR (Greatly Reduced Array of Processor Elements with Data Reduction) system, which will consist of 4096 processor chips each with 512 cores operating at the clock frequency of 500 MHz. The peak speed of a processor chip is 512Gflops (single precision) or 256 Gflops (double precision). The GRAPE-DR chip works as an attached processor to standard PCs. Currently, a PCI-X board with single GRAPE-DR chip is in operation. We are developing a 4-chip board with PCI-Express interface, which will have the peak performance of 1 Tflops. The final system will be a cluster of 512 PCs each with two GRAPE-DR boards. We plan to complete the final system by early 2009. The application area of GRAPE-DR covers particle-based simulations such as astrophysical many-body simulations and molecular-dynamics simulations, quantum chemistry calculations, various applications which requires dense matrix operations, and many other compute-intensive applications.