A shared cache for a chip multi vector processor

Authors:
Akihiro Musa;Yoshiei Sato;Takashi Soga;Koki Okabe;Ryusuke Egawa;Hiroyuki Takizawa;Hiroaki Kobayashi
Affiliations:
Tohoku University, Sendai, Japan/ NEC Corporation, Tokyo, Japan;Tohoku University, Sendai, Japan;NEC System technologies, Osaka, Japan;Tohoku University, Sendai, Japan;Tohoku University, Sendai, Japan;Tohoku University, Sendai, Japan;Tohoku University, Sendai, Japan
Venue:
Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Year:
2008

Citing 8
Cited 0

Evaluation of design alternatives for a multiprocessor microprocessor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Lockup-free instruction fetch/prefetch cache organization

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Scientific Computations on Modern Parallel Vector Systems

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific Computations

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Performance Evaluation of the Cray X1 Distributed Shared-Memory Architecture

IEEE Micro
An on-chip cache design for vector processors

MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
The Cray BlackWidow: a highly scalable vector multiprocessor

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Implications of memory performance for highly efficient supercomputing of scientific applications

ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper discusses the design of a chip multi vector processor (CMVP), especially examining the effects of an on-chip cache when the off-chip memory bandwidth is limited. As chip multiprocessors (CMPs) have become the mainstream in commodity scalar processors, the CMP architecture will be adopted to design of vector processors in the near future for harnessing a large number of transistors on a chip. To keep a higher sustained performance in execution of scientific and engineering applications, a vector processor (core) generally requires the ratio of the memory bandwidth to the arithmetic performance of at least 4 bytes/flop (B/FLOP). However, vector supercomputers have been encountering the memory wall problem due to the limited pin bandwidth. Therefore, we propose an on-chip shared cache to maintain the effective memory bandwidth for a CMVP. We evaluate the performance of the CMVP based on the NEC SX vector architecture using real scientific applications. Especially, we examine the caching effect on the sustained performance when the B/FLOP rate is decreased. The experimental results indicate that an 8 MB on-chip shared cache can improve the performance of a four-core CMVP by 15% to 40%, compared with that without the cache. This is because the shared cache can increase cache hit rates of multi-threads. Here, the shared cache employs a miss status handling registers, which has the potential for accelerating difference schemes in scientific and engineering applications. Moreover, we show that the 2 B/FLOP is enough for the CMVP to achieve a high scalability when the on-chip cache is employed.