Paravirtualization effect on single- and multi-threaded memory-intensive linear algebra software

Authors:
Lamia Youseff;Keith Seymour;Haihang You;Dmitrii Zagorodnov;Jack Dongarra;Rich Wolski
Affiliations:
Dept. of Computer Science, University of California, Santa Barbara, USA;Dept. of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, USA;Dept. of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, USA;Dept. of Computer Science, University of California, Santa Barbara, USA;Dept. of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, USA;Dept. of Computer Science, University of California, Santa Barbara, USA
Venue:
Cluster Computing
Year:
2009

Citing 19
Cited 0

Advanced compiler optimizations for supercomputers

Communications of the ACM - Special issue on parallelism
An extended set of FORTRAN basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology

ICS '97 Proceedings of the 11th international conference on Supercomputing
LAPACK Users' guide (third ed.)

LAPACK Users' guide (third ed.)
Basic Linear Algebra Subprograms for Fortran Usage

ACM Transactions on Mathematical Software (TOMS)
A new solution of Dijkstra's concurrent programming problem

Communications of the ACM
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
An updated set of basic linear algebra subprograms (BLAS)

ACM Transactions on Mathematical Software (TOMS)
Xen and the art of virtualization

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Using Phase Behavior in Scientific Application to Guide Linux Operating System Customization

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
Scale and performance in the Denali isolation kernel

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Virtualization for high-performance computing

ACM SIGOPS Operating Systems Review
Virtual Clusters for Grid Communities

CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
Live migration of virtual machines

NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
Proactive fault tolerance for HPC with Xen virtualization

Proceedings of the 21st annual international conference on Supercomputing
Evaluating the Performance Impact of Xen on MPI and Process Execution For HPC Systems

VTDC '06 Proceedings of the 2nd International Workshop on Virtualization Technology in Distributed Computing
Performance implications of virtualizing multicore cluster machines

Proceedings of the 2nd workshop on System-level virtualization for high performance computing
Autonomic Live Adaptation of Virtual Computational Environments in a Multi-Domain Infrastructure

ICAC '06 Proceedings of the 2006 IEEE International Conference on Autonomic Computing
Paravirtualization for HPC systems

ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking

Quantified Score

Hi-index	0.00

Visualization

Abstract

Previous studies have revealed that paravirtualization imposes minimal performance overhead on High Performance Computing (HPC) workloads, while exposing numerous benefits for this field. In this study, we are investigating the impact of paravirtualization on the performance of automatically-tuned software systems. We compare peak performance, performance degradation in constrained memory situations, performance degradation in multi-threaded applications, and inter-VM shared memory performance. For comparison purposes, we examine the proficiency of ATLAS, a quintessential example of an autotuning software system, in tuning the BLAS library routines for paravirtualized systems. Our results show that the combination of ATLAS and Xen paravirtualization delivers native execution performance and nearly identical memory hierarchy performance profiles in both single and multi-threaded scenarios. Furthermore, we show that it is possible to achieve memory sharing among OS instances at native speeds. These results expose new benefits to memory-intensive applications arising from the ability to slim down the guest OS without influencing the system performance. In addition, our findings support a novel and very attractive deployment scenario for computational science and engineering codes on virtual clusters and computational clouds.