An Invitation to the World of PAX
Computer
The birth of the second generation: the Hitachi S-820/80
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Interprocessor communication speed and performance in distributed-memory parallel processors
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
PACS: a parallel microprocessor array for scientific calculations
ACM Transactions on Computer Systems (TOCS)
The MIPS R3010 Floating-Point Coprocessor
IEEE Micro
DAP—a distributed array processor
ISCA '73 Proceedings of the 1st annual symposium on Computer architecture
OHMEGA: a VLSI superscalar processor architecture for numerical applications
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Network-based multicomputers: an emerging parallel architecture
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Architecture and evaluation of a high-speed networking subsystem for distributed-memory systems
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
A high-speed network interface for distributed-memory systems: architecture and applications
ACM Transactions on Computer Systems (TOCS)
Hi-index | 0.00 |
A study has been made of how cost-effectiveness due to the improvement of VLSI technology can apply to a scientific computer system without performance loss. The result is a parallel computer, ADENA (Alternating Direction Edition Nexus Array), with a core consisting of four kinds of VLSI chips, two for processor elements (PES) and two for the interprocessor network (plus some memory chips). An overview of ADENA and an analysis of its performance are given. The design considerations for the PEs incorporated in ADENA are discussed. The factors that limit performance in a parallel processing environment are analyzed, and the measures employed to improve these factors at the LSI design level are described. The 42.6 sq cm CMOS PEs reach a peak performance of 20 MFLOPS and a 256-PE ADENA 1.5 GFLOPS has been achieved and 300 to 400 MFLOPS for PDE applications.