The architecture of the DIVA processing-in-memory chip

Authors:
Jeff Draper;Jacqueline Chame;Mary Hall;Craig Steele;Tim Barrett;Jeff LaCoss;John Granacki;Jaewook Shin;Chun Chen;Chang Woo Kang;Ihn Kim;Gokhan Daglikoca
Affiliations:
USC Information Sciences Institute, Marina del Rey, CA;USC Information Sciences Institute, Marina del Rey, CA;USC Information Sciences Institute, Marina del Rey, CA;USC Information Sciences Institute, Marina del Rey, CA;USC Information Sciences Institute, Marina del Rey, CA;USC Information Sciences Institute, Marina del Rey, CA;USC Information Sciences Institute, Marina del Rey, CA;USC Information Sciences Institute, Marina del Rey, CA;USC Information Sciences Institute, Marina del Rey, CA;USC Information Sciences Institute, Marina del Rey, CA;USC Information Sciences Institute, Marina del Rey, CA;USC Information Sciences Institute, Marina del Rey, CA
Venue:
ICS '02 Proceedings of the 16th international conference on Supercomputing
Year:
2002

Citing 17
Cited 24

Active messages: a mechanism for integrated communication and computation

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Memory bandwidth limitations of future microprocessors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Missing the memory wall: the case for processor/memory integration

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Active pages: a computation model for intelligent memory

Proceedings of the 25th annual international symposium on Computer architecture
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
Performance of image and video processing with general-purpose processors and media ISA extensions

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Microservers: a new memory semantics for massively parallel computing

ICS '99 Proceedings of the 13th international conference on Supercomputing
Mapping irregular applications to DIVA, a PIM-based data-intensive architecture

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Exploiting superword level parallelism with multimedia instruction sets

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Processing in Memory: The Terasys Massively Parallel PIM Array

Computer
Computational RAM: Implementing Processors in Memory

IEEE Design & Test
Subword Parallelism with MAX-2

IEEE Micro
A Case for Intelligent RAM

IEEE Micro
Parallelizing Applications into Silicon

FCCM '99 Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines
An argument for simple COMA

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Impulse: Building a Smarter Memory Controller

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
FlexRAM: Toward an Advanced Intelligent Memory System

ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design

Gilgamesh: a multithreaded processor-in-memory architecture for petaflops computing

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Architectural Support for Uniprocessor and Multiprocessor Active Memory Systems

IEEE Transactions on Computers
Design and Optimization of Large Size and Low Overhead Off-Chip Caches

IEEE Transactions on Computers
Data forwarding through in-memory precomputation threads

Proceedings of the 18th annual international conference on Supercomputing
Memory in processor: a novel design paradigm for supercomputing architectures

MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
Superword-Level Parallelism in the Presence of Control Flow

Proceedings of the international symposium on Code generation and optimization
A Prototype Processing-In-Memory (PIM) Chip for the Data-Intensive Architecture (DIVA) System

Journal of VLSI Signal Processing Systems
Memory In Processor-Supercomputer On a Chip: Processor Design and Execution Semantics for Massive Single-Chip Performance

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 14 - Volume 15
A low cost, multithreaded processing-in-memory system

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
SCIMA-SMP: on-chip memory processor architecture for SMP

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
PIM lite: a multithreaded processor-in-memory prototype

GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
2 Gbps SerDes design based on IBM Cu-11 (130nm) standard cell technology

GLSVLSI '06 Proceedings of the 16th ACM Great Lakes symposium on VLSI
Processing-in-memory technology for knowledge discovery algorithms

DaMoN '06 Proceedings of the 2nd international workshop on Data management on new hardware
ALP: Efficient support for all levels of parallelism for complex media applications

ACM Transactions on Architecture and Code Optimization (TACO)
Future generation supercomputers I: a paradigm for node architecture

ACM SIGARCH Computer Architecture News - Special issue: ALPS '07---advanced low power systems
The revolution inside the box

Communications of the ACM - Web science
Destructive-read in embedded DRAM, impact on power consumption

Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
Evaluating compiler technology for control-flow optimizations for multimedia extension architectures

Microprocessors & Microsystems
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation

Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
Empirical study for optimization of power-performance with on-chip memory

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Preliminary design examination of the ParalleX system from a software and hardware perspective

ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Cache write-back schemes for embedded destructive-read DRAM

ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
Performance analysis of user-level PIM communication in the data intensive architecture (DIVA) system

HiPC'05 Proceedings of the 12th international conference on High Performance Computing
A new perspective on processing-in-memory architecture design

Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness

Quantified Score

Hi-index	0.01

Visualization

Abstract

The DIVA (Data IntensiVe Architecture) system incorporates a collection of Processing-In-Memory (PIM) chips as smart-memory co-processors to a conventional microprocessor. We have recently fabricated prototype DIVA PIMs. These chips represent the first smart-memory devices designed to support virtual addressing and capable of executing multiple threads of control. In this paper, we describe the prototype PIM architecture. We emphasize three unique features of DIVA PIMs, namely, the memory interface to the host processor, the 256-bit wide datapaths for exploiting on-chip bandwidth, and the address translation unit. We present detailed simulation results on eight benchmark applications. When just a single PIM chip is used, we achieve an average speedup of 3.3X over host-only execution, due to lower memory stall times and increased fine-grain parallelism. These 1-PIM results suggest that a PIM-based architecture with many such chips yields significantly higher performance than a multiprocessor of a similar scale and at a much reduced hardware cost.