Quantitative system performance: computer system analysis using queueing network models
Quantitative system performance: computer system analysis using queueing network models
Fundamentals of queueing theory (2nd ed.).
Fundamentals of queueing theory (2nd ed.).
Communications of the ACM
SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Analytic evaluation of shared-memory systems with ILP processors
Proceedings of the 25th annual international symposium on Computer architecture
Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Parallel programming in OpenMP
Parallel programming in OpenMP
Techniques for Optimizing Applications: High Performance Computing
Techniques for Optimizing Applications: High Performance Computing
Microarchitecture Optimizations for Exploiting Memory-Level Parallelism
Proceedings of the 31st annual international symposium on Computer architecture
Scientific Computations on Modern Parallel Vector Systems
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Comprehensive multiprocessor cache miss rate generation using multivariate models
ACM Transactions on Computer Systems (TOCS)
A performance methodology for commercial servers
IBM Journal of Research and Development
Characteristics of workloads used in high performance and technical computing
Proceedings of the 21st annual international conference on Supercomputing
A framework for end-to-end simulation of high-performance computing systems
Proceedings of the 1st international conference on Simulation tools and techniques for communications, networks and systems & workshops
Mapping parallelism to multi-cores: a machine learning based approach
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Modeling multigrain parallelism on heterogeneous multi-core processors: a case study of the cell BE
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Hi-index | 0.00 |
This work presents a general methodology for estimating the performance of an HPC workload when running on a future hardware architecture. Further, it demonstrates the methodology by estimating the performance of a significant scientific application -- the Gyrokinetic Toroidal Code (GTC) -- when executing on Sun's proposed next-generation petascale computer architecture.For GTC, we identify the important phases of the iteration and perform low-level analysis that includes instruction tracing and component simulations of processor and memory systems. Low-level analysis is complemented with scalability estimates based on modeling MPI, OpenMP and I/O activity in the code. The work's approach permits accurate end-to-end performance projections from the microarchitecture level to the petascale.