Planar point location using persistent search trees
Communications of the ACM
Making data structures persistent
Journal of Computer and System Sciences - 18th Annual ACM Symposium on Theory of Computing (STOC), May 28-30, 1986
Transformation approach to numerically integrating PDEs by means of WDF principles
Multidimensional Systems and Signal Processing
Journal of VLSI Signal Processing Systems - Parallel processing on VLSI arrays
Enabling technologies for petaflops computing
Enabling technologies for petaflops computing
A methodology for concurrent fabrication process/cell library optimization
DAC '96 Proceedings of the 33rd annual Design Automation Conference
Optimal Data Scheduling for Uniform Multidimensional Applications
IEEE Transactions on Computers
Combined DRAM and logic chip for massively parallel systems
ARVLSI '95 Proceedings of the 16th Conference on Advanced Research in VLSI (ARVLSI'95)
EXECUBE-A New Architecture for Scaleable MPPs
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Full Parallelism in Uniform Nested Loops Using Multi-Dimensional Retiming
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 02
A design analysis of a hybrid technology multithreaded architecture for petaflops scale computation3
ICS '99 Proceedings of the 13th international conference on Supercomputing
Microservers: a new memory semantics for massively parallel computing
ICS '99 Proceedings of the 13th international conference on Supercomputing
Mapping irregular applications to DIVA, a PIM-based data-intensive architecture
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Communication Reduction in Multiple Multicasts Based on Hybrid Static-Dynamic Scheduling
IEEE Transactions on Parallel and Distributed Systems
Automatic Code Mapping on an Intelligent Memory Architecture
IEEE Transactions on Computers
Demonstrating the Scalability of a Molecular Dynamics Application on a Petaflops Computer
International Journal of Parallel Programming
A Parallel-Object Programming Model for PetaFLOPS Machines and Blue Gene/Cyclops
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Energy/Performance Design of Memory Hierarchies for Processor-in-Memory Chips
IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
Adaptively Mapping Code in an Intelligent Memory Architecture
IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
Dissecting Cyclops: a detailed analysis of a multithreaded architecture
ACM SIGARCH Computer Architecture News
Programming the FlexRAM parallel intelligent memory system
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
The impact of grain size on the efficiency of embedded SIMD image processing architectures
Journal of Parallel and Distributed Computing
Enhancing NIC Performance for MPI using Processing-in-Memory
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 10
High Performance Computing Systems for Autonomous Spaceborne Missions
International Journal of High Performance Computing Applications
Energy savings through embedded processing on disk system
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Evaluation of OpenMP for the cyclops multithreaded architecture
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Enhanced loop coalescing: a compiler technique for transforming non-uniform iteration spaces
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Hi-index | 0.00 |
This paper is a summary of a proposal submitted to the NSF 100 Tera Flops Point Design Study. Its main thesis is that the use of Processing-In-Memory (PIM technology can provide an extremely dense and highly efficient base on which such computing systems can be constructed The paper describes a strawman organization of one potential PIM chip along with how multiple such chips might be organized into a real system, what the software supporting such a system might look like, and several applications which we will be attempting to place onto such a system.