The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Data cache performance of supercomputer applications
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Tile size selection using cache organization and data layout
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Memory bandwidth limitations of future microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Augmenting Loop Tiling with Data Alignment for Improved Cache Performance
IEEE Transactions on Computers - Special issue on cache memory and related problems
Performance of lattice QCD programs on CP-PACS
Parallel Computing - Special issue on high performance computing in lattice QCD
Reconfigurable caches and their application to media processing
Proceedings of the 27th annual international symposium on Computer architecture
SCIMA: Software Controlled Integrated Memory Architecture for High Performance Computing
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
A Multi-Level Memory System Architecture for High-Performance DSP Applications
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Hi-index | 0.00 |
The performance gap between processor and memory is very serious problem in high performance computing because effective performance is limited by memory ability. In order to overcome this problem, we propose a new VLSI architecture called SCIMA which integrates software controllable memory into a processor chip in addition to ordinary data cache. Most of data access is regular in high performance computing. Software controllable memory is better at making good use of the regularity than conventional cache. This paper presents its architecture and performance evaluation. In SCIMA, the ratio of software controllable memory and cache can be dynamically changed. Due to this feature, SCIMA is upper compatible with conventional memory architecture. Performance is evaluated by using CG and FT kernels of NPB Benchmark and a real application of QCD (Quantum ChromoDynamics). The evaluation results reveal that SCIMA is superior to conventional cache-based architecture. It is also revealed that the superiority of SCIMA increases when access latency of off-chip memory increases or its relative throughput gets lower.