The NAS parallel benchmarks—summary and preliminary results

Authors:
D. H. Bailey;E. Barszcz;J. T. Barton;D. S. Browning;R. L. Carter;L. Dagum;R. A. Fatoohi;P. O. Frederickson;T. A. Lasinski;R. S. Schreiber;H. D. Simon;V. Venkatakrishnan;S. K. Weeratunga
Affiliations:
Numerical Aerodynamic Simulation (NAS) Systems Division, NASA Ames Research Center, Mail Stop T045-1, Moffett Field, CA;Numerical Aerodynamic Simulation (NAS) Systems Division, NASA Ames Research Center, Mail Stop T045-1, Moffett Field, CA;Numerical Aerodynamic Simulation (NAS) Systems Division, NASA Ames Research Center, Mail Stop T045-1, Moffett Field, CA;Numerical Aerodynamic Simulation (NAS) Systems Division, NASA Ames Research Center, Mail Stop T045-1, Moffett Field, CA;Numerical Aerodynamic Simulation (NAS) Systems Division, NASA Ames Research Center, Mail Stop T045-1, Moffett Field, CA;Numerical Aerodynamic Simulation (NAS) Systems Division, NASA Ames Research Center, Mail Stop T045-1, Moffett Field, CA;Numerical Aerodynamic Simulation (NAS) Systems Division, NASA Ames Research Center, Mail Stop T045-1, Moffett Field, CA;Numerical Aerodynamic Simulation (NAS) Systems Division, NASA Ames Research Center, Mail Stop T045-1, Moffett Field, CA;Numerical Aerodynamic Simulation (NAS) Systems Division, NASA Ames Research Center, Mail Stop T045-1, Moffett Field, CA;Numerical Aerodynamic Simulation (NAS) Systems Division, NASA Ames Research Center, Mail Stop T045-1, Moffett Field, CA;Numerical Aerodynamic Simulation (NAS) Systems Division, NASA Ames Research Center, Mail Stop T045-1, Moffett Field, CA;Numerical Aerodynamic Simulation (NAS) Systems Division, NASA Ames Research Center, Mail Stop T045-1, Moffett Field, CA;Numerical Aerodynamic Simulation (NAS) Systems Division, NASA Ames Research Center, Mail Stop T045-1, Moffett Field, CA
Venue:
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Year:
1991

Citing 0
Cited 43

An improved supercomputer sorting benchmark

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Evaluation of mechanisms for fine-grained parallel programs in the J-machine and the CM-5

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Performance of cached DRAM organizations in vector supercomputers

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Implementing the multiprefix operation on parallel and vector computers

SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Tera hardware-software cooperation

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
A systematic approach to synthesize data alignment directives for distributed memory machines

Nordic Journal of Computing
Making Sequential Consistency Practical in Titanium

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Scaling MPI to short-memory MPPs such as BG/L

Proceedings of the 20th annual international conference on Supercomputing
Carbon: architectural support for fine-grained parallelism on chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Prefetch throttling and data pinning for improving performance of shared caches

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Identifying Inter-task Communication in Shared Memory Programming Models

IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
CG-Cell: an NPB benchmark implementation on cell broadband engine

ICDCN'08 Proceedings of the 9th international conference on Distributed computing and networking
Operating system support for mitigating software scalability bottlenecks on asymmetric multicore processors

Proceedings of the 7th ACM international conference on Computing frontiers
Enhancing L2 organization for CMPs with a center cell

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Executing MPI programs on virtual machines in an internet sharing system

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Dynamic adaptive scheduling for virtual machines

Proceedings of the 20th international symposium on High performance distributed computing
Parkour: parallel speedup estimates for serial programs

HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Per-call energy saving strategies in all-to-all communications

EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Kismet: parallel speedup estimates for serial programs

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Private virtual cluster: infrastructure and protocol for instant grids

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Complementing user-level coarse-grain parallelism with implicit speculative parallelism

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
DMA-circular: an enhanced high level programmable DMA controller for optimized management of on-chip local memories

Proceedings of the 9th conference on Computing Frontiers
Performance characterization of global address space applications: a case study with NWChem

Concurrency and Computation: Practice & Experience
Brief announcement: the problem based benchmark suite

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Riding Out the Storm: How to Deal with the Complexity of Grid and Cloud Management

Journal of Grid Computing
Hardware-software coherence protocol for the coexistence of caches and local memories

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Significantly reducing MPI intercommunication latency and power overhead in both embedded and HPC systems

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Understanding i/o performance using i/o skeletal applications

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
A Transformation Framework for Optimizing Task-Parallel Programs

ACM Transactions on Programming Languages and Systems (TOPLAS)
A comparative study of high-performance computing on the cloud

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Dynamic threshold for imbalance assessment on load balancing for multicore systems

Computers and Electrical Engineering
Semi-automatic extraction of software skeletons for benchmarking large-scale parallel applications

Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation
Energy saving strategies for parallel applications with point-to-point communication phases

Journal of Parallel and Distributed Computing
Enabling fair pricing on HPC systems with node sharing

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Energy efficiency in high-performance computing with and without knowledge of applications and services

International Journal of High Performance Computing Applications
FLEX-MPI: an MPI extension for supporting dynamic load balancing on heterogeneous non-dedicated systems

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Wavelength stealing: an opportunistic approach to channel sharing in multi-chip photonic interconnects

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Load balancing non-uniform parallel computations

Proceedings of the 2013 workshop on Programming based on actors, agents, and decentralized control
Fine-grained Benchmark Subsetting for System Selection

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Towards fair and efficient SMP virtual machine scheduling

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Improving application behavior on heterogeneous manycore systems through kernel mapping

Parallel Computing
Measurement of the latency parameters of the Multi-BSP model: a multicore benchmarking approach

The Journal of Supercomputing
Integrating profile-driven parallelism detection and machine-learning-based mapping

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

The NAS parallel benchmarks—summary and preliminary results

Quantified Score

Visualization

Abstract