Pursuing a Petaflop: Point Designs for 100 TF Computers Using PIM Technologies

Authors:
P. M. Kogge;S. C. Bass;J. B. Brockman;D. Z. Chen;E. Sha
Affiliations:
-;-;-;-;-
Venue:
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Year:
1996

Citing 10
Cited 18

Planar point location using persistent search trees

Communications of the ACM
Making data structures persistent

Journal of Computer and System Sciences - 18th Annual ACM Symposium on Theory of Computing (STOC), May 28-30, 1986
Transformation approach to numerically integrating PDEs by means of WDF principles

Multidimensional Systems and Signal Processing
Numerical integration of partial differential equations using principles of multidimensional wave digital filters

Journal of VLSI Signal Processing Systems - Parallel processing on VLSI arrays
Enabling technologies for petaflops computing

Enabling technologies for petaflops computing
A methodology for concurrent fabrication process/cell library optimization

DAC '96 Proceedings of the 33rd annual Design Automation Conference
Optimal Data Scheduling for Uniform Multidimensional Applications

IEEE Transactions on Computers
Combined DRAM and logic chip for massively parallel systems

ARVLSI '95 Proceedings of the 16th Conference on Advanced Research in VLSI (ARVLSI'95)
EXECUBE-A New Architecture for Scaleable MPPs

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Full Parallelism in Uniform Nested Loops Using Multi-Dimensional Retiming

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 02

A design analysis of a hybrid technology multithreaded architecture for petaflops scale computation3

ICS '99 Proceedings of the 13th international conference on Supercomputing
Microservers: a new memory semantics for massively parallel computing

ICS '99 Proceedings of the 13th international conference on Supercomputing
Mapping irregular applications to DIVA, a PIM-based data-intensive architecture

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Communication Reduction in Multiple Multicasts Based on Hybrid Static-Dynamic Scheduling

IEEE Transactions on Parallel and Distributed Systems
Automatic Code Mapping on an Intelligent Memory Architecture

IEEE Transactions on Computers
Demonstrating the Scalability of a Molecular Dynamics Application on a Petaflops Computer

International Journal of Parallel Programming
A Parallel-Object Programming Model for PetaFLOPS Machines and Blue Gene/Cyclops

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A Microserver View of HTMT

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Energy/Performance Design of Memory Hierarchies for Processor-in-Memory Chips

IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
Adaptively Mapping Code in an Intelligent Memory Architecture

IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
Dissecting Cyclops: a detailed analysis of a multithreaded architecture

ACM SIGARCH Computer Architecture News
Programming the FlexRAM parallel intelligent memory system

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
The impact of grain size on the efficiency of embedded SIMD image processing architectures

Journal of Parallel and Distributed Computing
Enhancing NIC Performance for MPI using Processing-in-Memory

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 10
High Performance Computing Systems for Autonomous Spaceborne Missions

International Journal of High Performance Computing Applications
Energy savings through embedded processing on disk system

ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Evaluation of OpenMP for the cyclops multithreaded architecture

WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Enhanced loop coalescing: a compiler technique for transforming non-uniform iteration spaces

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper is a summary of a proposal submitted to the NSF 100 Tera Flops Point Design Study. Its main thesis is that the use of Processing-In-Memory (PIM technology can provide an extremely dense and highly efficient base on which such computing systems can be constructed The paper describes a strawman organization of one potential PIM chip along with how multiple such chips might be organized into a real system, what the software supporting such a system might look like, and several applications which we will be attempting to place onto such a system.