Bit-serial SIMD on the CM-2 and the Cray-2
Journal of Parallel and Distributed Computing
A data parallel C and its platforms
FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
Efficient processing of one and two dimensional proximity queries in associative memory
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Preprototyping SIMD coprocessors using virtual machine emulation and trace compilation
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Kestrel: A Programmable Array for Sequence Analysis
Journal of VLSI Signal Processing Systems - Special issue on application specific systems, architectures and processors
Microservers: a new memory semantics for massively parallel computing
ICS '99 Proceedings of the 13th international conference on Supercomputing
Exploiting ILP in page-based intelligent memory
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Mapping irregular applications to DIVA, a PIM-based data-intensive architecture
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
An embedded DRAM architecture for large-scale spatial-lattice computations
Proceedings of the 27th annual international symposium on Computer architecture
The architecture of the DIVA processing-in-memory chip
ICS '02 Proceedings of the 16th international conference on Supercomputing
Computational RAM: Implementing Processors in Memory
IEEE Design & Test
An Effective Memory--Processor Integrated Architecture for Computer Vision
ICPP '97 Proceedings of the international Conference on Parallel Processing
Protein Sequence Comparison on the Instruction Systolic Array
PaCT '01 Proceedings of the 6th International Conference on Parallel Computing Technologies
Macroservers: An Object-Based Programming and Execution Model for Processor-in-Memory Arrays
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Memory Management in a PIM-Based Architecture
IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
Active Memory Clusters: Efficient Multiprocessing on Commodity Clusters
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Architectural Approaches for Multimedia Processing
ParNum '99 Proceedings of the 4th International ACPC Conference Including Special Tracks on Parallel Numerics and Parallel Computing in Image Processing, Video Processing, and Multimedia: Parallel Computation
Gilgamesh: a multithreaded processor-in-memory architecture for petaflops computing
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Kestrel: Design of an 8-bit SIMD Parallel Processor
ARVLSI '97 Proceedings of the 17th Conference on Advanced Research in VLSI (ARVLSI '97)
Matrix Multiplications on the Memory_based Processor Array
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Adaptive Parallel System as the High Performance Parallel Architecture
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
An Adder Using Charge Sharing and its Application in DRAMs
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Reducing Cost and Tolerating Defects in Page-based Intelligent Memory
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Cache Coherence in Intelligent Memory Systems
IEEE Transactions on Computers
The UCSC Kestrel Parallel Processor
IEEE Transactions on Parallel and Distributed Systems
Hyper customized processors for bio-sequence database scanning on FPGAs
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
A Prototype Processing-In-Memory (PIM) Chip for the Data-Intensive Architecture (DIVA) System
Journal of VLSI Signal Processing Systems
A low cost, multithreaded processing-in-memory system
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Memory-side prefetching for linked data structures for processor-in-memory systems
Journal of Parallel and Distributed Computing
Performance characteristics of MAUI: an intelligent memory system architecture
Proceedings of the 2005 workshop on Memory system performance
High-level synthesis using computation-unit integrated memories
Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Processing-in-memory technology for knowledge discovery algorithms
DaMoN '06 Proceedings of the 2nd international workshop on Data management on new hardware
Destructive-read in embedded DRAM, impact on power consumption
Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
A high performance fpga-based implementation of position specific iterated blast
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
High speed biological sequence analysis with hiddenMarkov models on reconfigurable platforms
IEEE Transactions on Information Technology in Biomedicine - Special section on computational intelligence in medical systems
A note on architectures for large-capacity CAMs
Integration, the VLSI Journal
LIRAC: using live range information to optimize memory access
ARCS'07 Proceedings of the 20th international conference on Architecture of computing systems
A highly parameterized and efficient FPGA-based skeleton for pairwise biological sequence alignment
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Preliminary design examination of the ParalleX system from a software and hardware perspective
ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
A combined arithmetic logic unit and memory element for the design of a parallel computer
ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part I
Cache write-back schemes for embedded destructive-read DRAM
ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
A resistive TCAM accelerator for data-intensive computing
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Biological sequence analysis with hidden markov models on an FPGA
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
A limits study of benefits from nanostore-based future data-centric system architectures
Proceedings of the 9th conference on Computing Frontiers
AC-DIMM: associative computing with STT-MRAM
Proceedings of the 40th Annual International Symposium on Computer Architecture
Hi-index | 4.10 |
This approach to processing in memory integrates single-instruction, multiple-data (SIMD) processing elements into the memory subsystem of a conventional computer. The processor-in-memory (PIM) chip is an enhanced 4-bit SRAM that associates a single-bit processor with each column of memory. To explore the viability of processing in memory, the authors built the Terasys workstation, a Sparcstation-2 augmented with 8 Mbytes of PIM memory holding 32K single-bit processors. They have also designed and implemented a high-level parallel language called data-parallel bit C (dbC). In normal memory mode, the PIM chips function as additional Sbus memory to the Sparc-2. In SIMD mode, the PIM chips accept commands from the Sparc-2 and execute those commands simultaneously on all PIM processors. Pairs of commands can be issued every 200 nanoseconds, giving an effective instruction issue rate of 100 ns. Peak performance for the 32K-processor system is 3.2 '1011 bit operations per second. Microcoded applications have reached (and in one case, exceeded) this theoretical peak, which is the equivalent of 25 Cray-YMP processors. With the successful creation of the Terasys research prototype, the authors have begun work on PIM in a supercomputer setting. In a collaborative research project with Cray Computer, they are incorporating a new Cray-designed implementation of the PIM chips into two octants of Cray-3 memory.