Processor coupling: integrating compile time and runtime scheduling for parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Mosaic C: An Experimental Fine-Grain Multicomputer
Proceedings of the International Conference on Future Tendencies in Computer Science, Control and Applied Mathematics
EXECUBE-A New Architecture for Scaleable MPPs
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Process multi-circuit optimization
DAC '98 Proceedings of the 35th annual Design Automation Conference
25 years of the international symposia on Computer architecture (selected papers)
Increasing power efficiency of multi-core network processors through data filtering
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
PQE2000: HPC Tools for Industrial Applications
IEEE Concurrency
IEEE Micro
Pursuing a Petaflop: Point Designs for 100 TF Computers Using PIM Technologies
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Architecture, algorithms and applications for future generation supercomputers
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
High Performance Computing Systems for Autonomous Spaceborne Missions
International Journal of High Performance Computing Applications
Communications of the ACM - Web science
Destructive-read in embedded DRAM, impact on power consumption
Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
Compile-Time thread distinguishment algorithm on VIM-Based architecture
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Cache write-back schemes for embedded destructive-read DRAM
ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
Local Interpolation-based Polar Format SAR: Algorithm, Hardware Implementation and Design Automation
Journal of Signal Processing Systems
Hi-index | 0.00 |
A new 5 V 0.8 /spl mu/m CMOS technology merges 100 K custom circuits and 4.5 Mb DRAM onto a single die that supports both high density memory and significant computing logic. One of the first chips built with this technology implements a unique Processor-In-Memory (PIM) computer architecture termed EXECUBE and has 8 separate 25 MHz CPU macros and 16 separate 32 K/spl times/9 b DRAM macros on a single die. These macros are organized together to provide a single part type for scaleable massively parallel processing applications, particularly embedded ones where minimal glue logic is desired. Each chip delivers 50 Mips of performance at 2.7 W. This paper overviews the basic chip technology and organization some projections on the future of EXECUBE-like PIM chips, and finally some lessons to be learned as to why this technology should radically affect the way we ought think about computer architecture.