SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Multipole translation theory for the three-dimensional Laplace and Helmholtz equations
SIAM Journal on Scientific Computing
A design analysis of a hybrid technology multithreaded architecture for petaflops scale computation3
ICS '99 Proceedings of the 13th international conference on Supercomputing
Microservers: a new memory semantics for massively parallel computing
ICS '99 Proceedings of the 13th international conference on Supercomputing
Mapping irregular applications to DIVA, a PIM-based data-intensive architecture
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Smart Memories: a modular reconfigurable architecture
Proceedings of the 27th annual international symposium on Computer architecture
Molecular Dynamics Simulation: Elementary Methods
Molecular Dynamics Simulation: Elementary Methods
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
IEEE Micro
High-Concurrency Locking in R-Trees
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
PIM Architectures to Support Petaflops Level Computation in the HTMT Machine
IWIA '99 Proceedings of the 1999 International Workshop on Innovative Architecture
Hi-index | 0.00 |
Processing-In-Memory (PIM) circumvents the von Neumann bottleneck by combining logic and memory (typically DRAM) on a single die. This work examines the memory system parameters for constructing PIM based parallel computers which are capable of meeting the memory access demands of complex programs that exhibit low reuse and non uniform stride accesses. The analysis uses the Data Intensive Systems (DIS) benchmark suite to examine these demanding memory access patterns. The characteristics of such applications are discussed in detail. Simulations demonstrate that PIMs are capable of supporting enough data to be multicomputer nodes. Additionally, the results show that even data intensive code exhibits a large amount of internal spatial locality. A mobile thread execution model is presented that takes advantage of the tremendous amount of internal bandwidth available on a given PIM node and the locality exhibited by the application.