The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Yin and yang in computer science
Communications of the ACM
Time Warp simulation on clumps
PADS '99 Proceedings of the thirteenth workshop on Parallel and distributed simulation
A Programming Methodology for Dual-Tier Multicomputers
IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools for parallel processing
Communication overlap in multi-tier parallel algorithms
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Multi-protocol active messages on a cluster of SMP's
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Performance Tradeoffs in Multi-tier Formulation of a Finite Difference Method
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
SMS - Tool for Development and Performance Analysis of Parallel Applications
ANSS '04 Proceedings of the 37th annual symposium on Simulation
Recent trends in the marketplace of high performance computing
Parallel Computing
Performance prediction through simulation of a hybrid MPI/OpenMP application
Parallel Computing - OpenMp
Performance of MC2 and the ECMWF IFS forecast model on the Fujitsu VPP700 and NEC SX-4M
Scientific Programming
Recent trends in the marketplace of high performance computing
Parallel Computing
Finding, expressing and managing parallelism in programs executed on clusters of workstations
Computer Communications
Hi-index | 4.12 |
For 28 years, the author has worked on machines that span three revolutions in supercomputer design: vector supercomputing, parallel supercomputing on multiple CPUs, and supercomputing on hierarchically organized clusters of microprocessors with cache memories. The author describes these revolutions from the perspective of numerical algorithms and the programs that implement them, and he looks toward the future and the coming of distributed shared-memory (DSM) and shared-memory multiprocessor (SMP) architectures. The new architectures can combine the performance benefits of massively parallel computing with the flexibility of shared-memory multiprocessors. Like all supercomputer systems, however, these new machines will strongly favor certain numerical algorithms and force others to execute at much slower speeds. The usefulness of the favored algorithms and the ease with which they can be implemented, together with the Gflops/sec. or Tflops/sec. these algorithms achieve, will determine the scientific output of these machines. Ultimately, it is the scientific output that is the true meaning of a supercomputer.