Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
Estimating cache misses and locality using stack distances
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Combined DRAM and logic chip for massively parallel systems
ARVLSI '95 Proceedings of the 16th Conference on Advanced Research in VLSI (ARVLSI'95)
PIM Architectures to Support Petaflops Level Computation in the HTMT Machine
IWIA '99 Proceedings of the 1999 International Workshop on Innovative Architecture
Automatically Mapping Code on an Intelligent Memory Architecture
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
The Earth Simulator and Beyond-Technological Considerations Toward the Sustained PetaFlops Machine
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Analysis and Modeling of Advanced PIM Architecture Design Tradeoffs
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
QCDOC: A 10 Teraflops Computer for Tightly-Coupled Calculations
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Hi-index | 0.00 |
VIM integrates vector units into memory, which exploits the low-latency and high-bandwidth memory access. On VIM-based architecture, the low temporal locality thread running on VIM processor is called Light-Weight Thread, while the low cache miss rate thread running on host processor is called Heavy-Weight Thread. The thread distinguishment can impact the system performance directly. Compared with the distinguishment at programming model level, compile-time thread distinguishment can release programmer from changing existing program. After overviewing the VIM micro-architecture and the system architecture, this paper presents an analytical model of thread distinguishment. Based on this model, we present a compile-time algorithm and evaluate it with two thread instances on the evaluation environment we develop. We find that parameters affecting the thread distinguishment are the cache miss rate, the vectorizable operation rate and the arithmetic-to-memory ratio. We believe that this algorithm is constructive to improve the performance of the VIM-based node computer.