Performance tuning and analysis of future vector processors based on the roofline model
Proceedings of the 10th workshop on MEmory performance: DEaling with Applications, systems and architecture
Performance evaluation of NEC SX-9 using real science and engineering applications
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Hi-index | 0.00 |
Vector supercomputers have been encountering the memory wall problem and their memory bandwidth per flop/s rate has decreased. To cover the insufficient memory bandwidth per flop/s rate, an on-chip vector cache has been proposed for the vector processors. Although vector caching is effective to increase the sustained performance to a certain degree, it still needs software and hardware supporting mechanisms to extract its potential. To this end, we propose miss status handling registers (MSHR) and a prefetch mechanism. This paper evaluates the performance of the vector cache with the MSHR and the prefetch mechanism on the vector supercomputer across three leading scientific applications. The MSHR is an effective mechanism for handling subsequent vector loads of the same data, which frequently appear in different schemes. The experimental results indicate that theMSHR can improve the computational performance of scientific applications by 1.45×. Moreover, we examine the performance of the prefetch mechanism on the vector cache. The prefetch mechanism increases the computational performance by 1.6×. Accordingly, the MSHR and the prefetching mechanism are very effective optimization options for vector caching of future vector supercomputers even if the vector supercomputers cannot maintain the current memory bandwidth per flop/s rate.