Impulse: Building a Smarter Memory Controller
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
A New Memory Module for COTS-Based Personal Supercomputing
IWIA '04 Proceedings of the Innovative Architecture for Future Generation High-Performance Processors and Systems
Highly Functional Memory Architecture for Large-Scale Data Applications
IWIA '04 Proceedings of the Innovative Architecture for Future Generation High-Performance Processors and Systems
DRAMsim: a memory system simulator
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Larrabee: a many-core x86 architecture for visual computing
ACM SIGGRAPH 2008 papers
An Enhancer of Memory and Network for Cluster and its Applications
PDCAT '08 Proceedings of the 2008 Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies
The Journal of Supercomputing
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU
Proceedings of the 37th annual international symposium on Computer architecture
Scaleable Sparse Matrix-Vector Multiplication with Functional Memory and GPUs
PDP '11 Proceedings of the 2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing
The university of Florida sparse matrix collection
ACM Transactions on Mathematical Software (TOMS)
Automatically tuning sparse matrix-vector multiplication for GPU architectures
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Hi-index | 0.00 |
Compute intensive processing can be easily accelerated using processors with many cores such as GPUs. However, memory bandwidth limitation becomes serious year by year for memory bandwidth intensive applications such as sparse matrix vector multiplications (SpMV). In order to accelerate memory bandwidth intensive applications, we have proposed a memory system with additional functions of scattering and gathering. For the preliminary evaluation of our proposed system, we assumed that the throughput of the memory system was sufficient. In this paper, we propose a memory system with scattering and gathering using many narrow memory channels. We evaluate the feasible throughput of the proposed memory system based on DDR3 DRAM with the modified DRAMsim2 simulator. In addition, we evaluate the performance of SpMV using our method for the proposed memory system and a GPU. We have confirmed the proposed memory system has good performance and good stability for matrix shape variation using fewer pins for external memory.