Performance evaluation of vector accesses in parallel memories using a skewed storage scheme
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Pseudo-randomly interleaved memory
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Cray Y-MP C90: system features and early benchmark results
Parallel Computing
Scalable parallel memory architecture with a skew scheme
ICS '93 Proceedings of the 7th international conference on Supercomputing
Parallel processing architecture for the Hitachi S-3800 shared-memory vector multiprocessor
ICS '93 Proceedings of the 7th international conference on Supercomputing
High-speed storage control schemes of HITACHI supercomputer S-820 system
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Data caches for superscalar processors
ICS '97 Proceedings of the 11th international conference on Supercomputing
On high-bandwidth data cache design for multi-issue processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Out-of-order vector architectures
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MOM: a matrix SIMD instruction set architecture for multimedia applications
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Increasing the effective bandwidth of complex memory systems in multivector processors
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Hi-index | 0.00 |
This paper discusses the storage control unit of the Hitachi S-3800 supercomputer series, which is capable of achieving 8 GFLOPS in each of up to four shared-memory multiprocessors. This storage control unit is distributed to the V-SCs (vector-processor-side storage control units) and the M-SCs (main-storage-side storage control units), and achieves 128 gigabytes per second of total memory throughput. This distributed storage control unit supports scalability with increases in the number of processors and segmented parallel pipelines, simply by reconnecting the flat cables between the V-SCs and M-SCs.The distributed storage control unit also facilitated high sustained memory throughput for all types of vector-load and -store instructions. It features three new storage control schemes. (1) A hierarchical request-identification-number assignment scheme, which allows independent parallel memory access control in the V-SCs and M-SCs. This also enhances the indirect memory access performance. (2) A multistage address modification scheme, which achieves conflict-free constant-stride parallel memory access in both the V-SCs and M-SCs. (3) An instruction-based variable priority scheme, which achieves stable high memory throughput independent of other programs executed on the other processors. Results of performance measurements show the benefit of these schemes in the scalable distributed storage control unit for the S-3800 series.