A scalar architecture for pseudo vector processing based on slide-windowed registers

Authors:
Hiroshi Nakamura;Taisuke Boku;Hideo Wada;Hiromitsu Imori;Ikuo Nakata;Yasuhiro Inagami;Kisaburo Nakazawa;Yoshiyuki Yamashita
Affiliations:
-;-;-;-;-;-;-;-
Venue:
ICS '93 Proceedings of the 7th international conference on Supercomputing
Year:
1993

Citing 16
Cited 7

Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
A unified vector/scalar floating-point architecture

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
High-bandwidth data memory systems for superscalar processors

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Performance evaluation of the IBM RISC System/6000: comparison of an optimized scalar processor with two vector processors

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Data cache performance of supercomputer applications

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
An architecture for software-controlled data prefetching

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Computer Technology and Architecture: An Evolving Interaction

Computer
An effective on-chip preloading scheme to reduce data access penalty

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Register allocation for software pipelined loops

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Benchmarking a vector-processor prototype based on multithreaded streaming/FIFO vector (MSFV) architecture

ICS '92 Proceedings of the 6th international conference on Supercomputing
Pseudo vector processor based on register-windowed superscalar pipeline

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Cache Memories

ACM Computing Surveys (CSUR)
A Fortran compiler for the FPS-164 scientific computer

SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Lockup-free instruction fetch/prefetch cache organization

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Performance of the IPSC/860 Node Architecture

Performance of the IPSC/860 Node Architecture

CP-PACS: a massively parallel processor for large scale scientific calculations

ICS '97 Proceedings of the 11th international conference on Supercomputing
Heterogeneous multi-computer system: a new platform for multi-paradigm scientific simulation

ICS '02 Proceedings of the 16th international conference on Supercomputing
Design and implementation of FMPL, a fast message-passing library for remote memory operations

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Architecture and Performance of the Hitachi SR2201 Massively Parallel Processor System

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Performance evaluation of CP-PACS on CG benchmark

HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Performance Improvement for Matrix Calculation on CP-PACS Node Processor

HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
The Architecture of Massively Parallel Processor CP-PACS

PAS '97 Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a new scalar architecture for high-speed vector processing. Without using cache memory, the proposed architecture tolerates main memory access latency by introducing slide-windowed floating-point registers with data preloading feature and pipelined memory. The architecture can hold upward compatibility with existing scalar architectures. In the new architecture, software can control the window structure. This is the advantage compared with our previous work of register-windows. Because of this advantage, registers are utilized more flexibly and computational efficiency is largely enhanced. Furthermore, this flexibility helps the compiler to generate efficient object codes easily.We have evaluated its performance on Livermore Fortran Kernels. The evaluation results show that the proposed architecture reduces the penalty of main memory access better than an ordinary scalar processor and a processor with cache prefetching. The proposed architecture with 64 registers tolerates memory access latency of 30 CPU cyles. Compared with our previous work, the proposed architecture hides longer memory access latency with fewer registers.