A memory accelerator with gather functions for bandwidth-bound irregular applications

Authors:
Noboru Tanabe;Boonyasitpichai Nuttapon;Hironori Nakajo;Yuka Ogawa;Junko Kogou;Masami Takata;Kazuki Joe
Affiliations:
Toshiba Corporation, Kawasaki, Japan;Tokyo University of Agriculture and Technology, Koganei, Japan;Tokyo University of Agriculture and Technology, Koganei, Japan;Nara Women's University, Nara, Japan;Nara Women's University, Nara, Japan;Nara Women's University, Nara, Japan;Nara Women's University, Nara, Japan
Venue:
Proceedings of the first workshop on Irregular applications: architectures and algorithm
Year:
2011

Citing 12
Cited 0

Impulse: Building a Smarter Memory Controller

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
A New Memory Module for COTS-Based Personal Supercomputing

IWIA '04 Proceedings of the Innovative Architecture for Future Generation High-Performance Processors and Systems
Highly Functional Memory Architecture for Large-Scale Data Applications

IWIA '04 Proceedings of the Innovative Architecture for Future Generation High-Performance Processors and Systems
DRAMsim: a memory system simulator

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Larrabee: a many-core x86 architecture for visual computing

ACM SIGGRAPH 2008 papers
An Enhancer of Memory and Network for Cluster and its Applications

PDCAT '08 Proceedings of the 2008 Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies
An enhancer of memory and network for applications with large-capacity data and non-continuous data accessing

The Journal of Supercomputing
Sparc64 VIIIfx: A New-Generation Octocore Processor for Petascale Computing

IEEE Micro
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU

Proceedings of the 37th annual international symposium on Computer architecture
Scaleable Sparse Matrix-Vector Multiplication with Functional Memory and GPUs

PDP '11 Proceedings of the 2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing
The university of Florida sparse matrix collection

ACM Transactions on Mathematical Software (TOMS)
Automatically tuning sparse matrix-vector multiplication for GPU architectures

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Compute intensive processing can be easily accelerated using processors with many cores such as GPUs. However, memory bandwidth limitation becomes serious year by year for memory bandwidth intensive applications such as sparse matrix vector multiplications (SpMV). In order to accelerate memory bandwidth intensive applications, we have proposed a memory system with additional functions of scattering and gathering. For the preliminary evaluation of our proposed system, we assumed that the throughput of the memory system was sufficient. In this paper, we propose a memory system with scattering and gathering using many narrow memory channels. We evaluate the feasible throughput of the proposed memory system based on DDR3 DRAM with the modified DRAMsim2 simulator. In addition, we evaluate the performance of SpMV using our method for the proposed memory system and a GPU. We have confirmed the proposed memory system has good performance and good stability for matrix shape variation using fewer pins for external memory.