Processing in Memory: The Terasys Massively Parallel PIM Array

Authors:
Maya Gokhale;Bill Holmes;Ken Iobst
Affiliations:
-;-;-
Venue:
Computer
Year:
1995

Citing 2
Cited 45

Bit-serial SIMD on the CM-2 and the Cray-2

Journal of Parallel and Distributed Computing
A data parallel C and its platforms

FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)

Efficient processing of one and two dimensional proximity queries in associative memory

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Preprototyping SIMD coprocessors using virtual machine emulation and trace compilation

SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Kestrel: A Programmable Array for Sequence Analysis

Journal of VLSI Signal Processing Systems - Special issue on application specific systems, architectures and processors
Microservers: a new memory semantics for massively parallel computing

ICS '99 Proceedings of the 13th international conference on Supercomputing
Exploiting ILP in page-based intelligent memory

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Mapping irregular applications to DIVA, a PIM-based data-intensive architecture

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
An embedded DRAM architecture for large-scale spatial-lattice computations

Proceedings of the 27th annual international symposium on Computer architecture
The architecture of the DIVA processing-in-memory chip

ICS '02 Proceedings of the 16th international conference on Supercomputing
Computational RAM: Implementing Processors in Memory

IEEE Design & Test
Intelligent-Memory Architecture for Artificial Neural Networks

IEEE Micro
An Effective Memory--Processor Integrated Architecture for Computer Vision

ICPP '97 Proceedings of the international Conference on Parallel Processing
Protein Sequence Comparison on the Instruction Systolic Array

PaCT '01 Proceedings of the 6th International Conference on Parallel Computing Technologies
Macroservers: An Object-Based Programming and Execution Model for Processor-in-Memory Arrays

ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Memory Management in a PIM-Based Architecture

IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
Active Memory Clusters: Efficient Multiprocessing on Commodity Clusters

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Architectural Approaches for Multimedia Processing

ParNum '99 Proceedings of the 4th International ACPC Conference Including Special Tracks on Parallel Numerics and Parallel Computing in Image Processing, Video Processing, and Multimedia: Parallel Computation
Gilgamesh: a multithreaded processor-in-memory architecture for petaflops computing

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Kestrel: Design of an 8-bit SIMD Parallel Processor

ARVLSI '97 Proceedings of the 17th Conference on Advanced Research in VLSI (ARVLSI '97)
Matrix Multiplications on the Memory_based Processor Array

HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Adaptive Parallel System as the High Performance Parallel Architecture

HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
An Adder Using Charge Sharing and its Application in DRAMs

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Reducing Cost and Tolerating Defects in Page-based Intelligent Memory

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Cache Coherence in Intelligent Memory Systems

IEEE Transactions on Computers
The UCSC Kestrel Parallel Processor

IEEE Transactions on Parallel and Distributed Systems
Hyper customized processors for bio-sequence database scanning on FPGAs

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
A Prototype Processing-In-Memory (PIM) Chip for the Data-Intensive Architecture (DIVA) System

Journal of VLSI Signal Processing Systems
A low cost, multithreaded processing-in-memory system

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Memory-side prefetching for linked data structures for processor-in-memory systems

Journal of Parallel and Distributed Computing
Performance characteristics of MAUI: an intelligent memory system architecture

Proceedings of the 2005 workshop on Memory system performance
High-level synthesis using computation-unit integrated memories

Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Processing-in-memory technology for knowledge discovery algorithms

DaMoN '06 Proceedings of the 2nd international workshop on Data management on new hardware
Destructive-read in embedded DRAM, impact on power consumption

Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
A high performance fpga-based implementation of position specific iterated blast

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation

Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
High speed biological sequence analysis with hiddenMarkov models on reconfigurable platforms

IEEE Transactions on Information Technology in Biomedicine - Special section on computational intelligence in medical systems
A note on architectures for large-capacity CAMs

Integration, the VLSI Journal
LIRAC: using live range information to optimize memory access

ARCS'07 Proceedings of the 20th international conference on Architecture of computing systems
A highly parameterized and efficient FPGA-based skeleton for pairwise biological sequence alignment

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Preliminary design examination of the ParalleX system from a software and hardware perspective

ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
A combined arithmetic logic unit and memory element for the design of a parallel computer

ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part I
Cache write-back schemes for embedded destructive-read DRAM

ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
A resistive TCAM accelerator for data-intensive computing

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Biological sequence analysis with hidden markov models on an FPGA

ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
A limits study of benefits from nanostore-based future data-centric system architectures

Proceedings of the 9th conference on Computing Frontiers
AC-DIMM: associative computing with STT-MRAM

Proceedings of the 40th Annual International Symposium on Computer Architecture

Quantified Score

Hi-index	4.10

Visualization

Abstract

This approach to processing in memory integrates single-instruction, multiple-data (SIMD) processing elements into the memory subsystem of a conventional computer. The processor-in-memory (PIM) chip is an enhanced 4-bit SRAM that associates a single-bit processor with each column of memory. To explore the viability of processing in memory, the authors built the Terasys workstation, a Sparcstation-2 augmented with 8 Mbytes of PIM memory holding 32K single-bit processors. They have also designed and implemented a high-level parallel language called data-parallel bit C (dbC). In normal memory mode, the PIM chips function as additional Sbus memory to the Sparc-2. In SIMD mode, the PIM chips accept commands from the Sparc-2 and execute those commands simultaneously on all PIM processors. Pairs of commands can be issued every 200 nanoseconds, giving an effective instruction issue rate of 100 ns. Peak performance for the 32K-processor system is 3.2 '1011 bit operations per second. Microcoded applications have reached (and in one case, exceeded) this theoretical peak, which is the equivalent of 25 Cray-YMP processors. With the successful creation of the Terasys research prototype, the authors have begun work on PIM in a supercomputer setting. In a collaborative research project with Cray Computer, they are incorporating a new Cray-designed implementation of the PIM chips into two octants of Cray-3 memory.