Analysis and Modeling of Advanced PIM Architecture Design Tradeoffs

Authors:
Ed Upchurch;Thomas Sterling;Jay Brockman
Affiliations:
California Institute of Technology;California Institute of Technology;University of Notre Dame
Venue:
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Year:
2004

Citing 21
Cited 4

Executing a program on the MIT tagged-token dataflow architecture

Volume II: Parallel Languages on PARLE: Parallel Architectures and Languages Europe
Toward a dataflow/von Neumann hybrid architecture

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Can dataflow subsume von Neumann computing?

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Analysis of multithreaded architectures for parallel computing

SPAA '90 Proceedings of the second annual ACM symposium on Parallel algorithms and architectures
The J-machine system

Artificial intelligence at MIT expanding frontiers
Active messages: a mechanism for integrated communication and computation

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
TAM—a compiler controlled threaded abstract machine

Journal of Parallel and Distributed Computing - Special issue on dataflow and multithreaded architectures
The J-machine multicomputer: an architectural evaluation

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Enabling technologies for petaflops computing

Enabling technologies for petaflops computing
Missing the memory wall: the case for processor/memory integration

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The energy efficiency of IRAM architectures

Proceedings of the 24th annual international symposium on Computer architecture
A design analysis of a hybrid technology multithreaded architecture for petaflops scale computation3

ICS '99 Proceedings of the 13th international conference on Supercomputing
Microservers: a new memory semantics for massively parallel computing

ICS '99 Proceedings of the 13th international conference on Supercomputing
Monsoon: an explicit token-store architecture

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Mapping irregular applications to DIVA, a PIM-based data-intensive architecture

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
The Message Driven Processor: An Integrated Multicomputer Processing Element

ICCD '92 Proceedings of the 1991 IEEE International Conference on Computer Design on VLSI in Computer & Processors
An overview of the BlueGene/L Supercomputer

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
The Dynamic Associative Access Memory Chip and Its Application to SIMD Processing and Full-Text Database Retrieval

MTDT '99 Proceedings of the 1999 IEEE International Workshop on Memory Technology, Design, and Testing
The Connection Machine

The Connection Machine
ACTORS AND CONTINUOUS FUNCTIONALS

ACTORS AND CONTINUOUS FUNCTIONALS
EXECUBE-A New Architecture for Scaleable MPPs

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01

Will Moore's Law Be Sufficient?

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Self-aware memory: managing distributed memory in an autonomous multi-master environment

ARCS'08 Proceedings of the 21st international conference on Architecture of computing systems
Compile-Time thread distinguishment algorithm on VIM-Based architecture

ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Exascale workload characterization and architecture implications

Proceedings of the High Performance Computing Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

A major trend in high performance computer architecture over the last two decades is the migration of memory in the form of high speed caches onto the microprocessor semiconductor die. Where temporal locality in the computation is high, caches prove very effective at hiding memory access latency and contention for communication resources. However where temporal locality is absent, caches may exhibit low hit rates resulting in poor operational efficiency. Vector computing exploiting pipelined arithmetic units and memory access address this challenge for certain forms of data access patterns, for example involving long contiguous data sets exhibiting high spatial locality. But for many advanced applications for science, technology, and national security at least some data access patterns are not consistent to the restricted forms well handled by either caches or vector processing. An important alternative is the reverse strategy; that of migrating logic in to the main memory (DRAM) and performing those operations directly on the data stored there. Processor in Memory (PIM) architecture has advanced to the point where it may fill this role and provide an important new mechanism for improving performance and efficiency of future supercomputers for a broad range of applications. One important project considering both the role of PIM in supercomputer architecture and the design of such PIM components is the Cray Cascade Project sponsored by the DARPA High Productivity Computing Program. Cascade is a Petaflops scale computer targeted for deployment at the end of the decade that merges the raw speed of an advanced custom vector architecture with the high memory bandwidth processing delivered by an innovative class of PIM architecture. The work represented here was performed under the Cascade project to explore critical design space issues that will determine the value of PIM in supercomputers and contribute to the optimization of its design. But this work also has strong relevance to hybrid systems comprising a combination of conventional microprocessors and advanced PIM based intelligent main memory.