Effective use of Cray supercomputers
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Operating system support for parallel programming on RP3
IBM Journal of Research and Development
Effective distributed scheduling of parallel workloads
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A closer look at coscheduling approaches for a network of workstations
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Communications of the ACM
Paging tradeoffs in distributed-shared-memory multiprocessors
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
A Gang-Scheduling System for ASCI Blue-Pacific
HPCN Europe '99 Proceedings of the 7th International Conference on High-Performance Computing and Networking
Job Scheduling Under the Portable Batch System
IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
IMPuLSE: integrated monitoring and profiling for large-scale environments
LCR '04 Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems
System noise, OS clock ticks, and fine-grained parallel applications
Proceedings of the 19th annual international conference on Supercomputing
Towards a framework for dedicated operating systems development in high-end computing systems
ACM SIGOPS Operating Systems Review
Operating system issues for petascale systems
ACM SIGOPS Operating Systems Review
HPC-Colony: services and interfaces for very large systems
ACM SIGOPS Operating Systems Review
The context-switch overhead inflicted by hardware interrupts (and the enigma of do-nothing loops)
Proceedings of the 2007 workshop on Experimental computer science
The context-switch overhead inflicted by hardware interrupts (and the enigma of do-nothing loops)
ecs'07 Experimental computer science on Experimental computer science
High-level application-specific performance analysis using the G-PM tool
Future Generation Computer Systems
Integrated parallel performance views
Cluster Computing
Implications of application usage characteristics for collective communication offload
International Journal of High Performance Computing and Networking
Automatic software interference detection in parallel applications
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Characterizing application sensitivity to OS interference using kernel-level noise injection
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Improving performance by embedding HPC applications in lightweight Xen domains
Proceedings of the 2nd workshop on System-level virtualization for high performance computing
The Impact of noise on the scaling of collectives: the nearest neighbor model
HiPC'07 Proceedings of the 14th international conference on High performance computing
New challenges of parallel job scheduling
JSSPP'07 Proceedings of the 13th international conference on Job scheduling strategies for parallel processing
Characterizing the Influence of System Noise on Large-Scale Applications by Simulation
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
jitSim: a simulator for predicting scalability of parallel applications in presence of OS jitter
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Linux kernel co-scheduling for bulk synchronous parallel applications
Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
Juggle: proactive load balancing on multicore computers
Proceedings of the 20th international symposium on High performance distributed computing
A case for non-blocking collective operations
ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
The impact of noise on the scaling of collectives: a theoretical approach
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Assessing MPI performance on QsNetIIt
PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Early experiences with KTAU on the IBM BG/L
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Virtual InfiniBand clusters for HPC clouds
Proceedings of the 2nd International Workshop on Cloud Computing Platforms
Linux kernel co-scheduling and bulk synchronous parallelism
International Journal of High Performance Computing Applications
Enabling event tracing at leadership-class scale through I/O forwarding middleware
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Stepping towards noiseless Linux environment
Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers
Software—Practice & Experience
Concurrency and Computation: Practice & Experience
The impact of system design parameters on application noise sensitivity
Cluster Computing
Understanding and isolating the noise in the Linux kernel
International Journal of High Performance Computing Applications
There goes the neighborhood: performance degradation due to nearby jobs
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Amdahl's law in the era of process variation
International Journal of High Performance Systems Architecture
Optimizing I/O forwarding techniques for extreme-scale event tracing
Cluster Computing
Hi-index | 0.00 |
A parallel application benefits from scheduling policies that include a global perspective of the application's process working set. As the interactions among cooperating processes increase, mechanisms to ameliorate waiting within one or more of the processes become more important. In particular, collective operations such as barriers and reductions are extremely sensitive to even usually harmless events such as context switches among members of the process working set. For the last 18 months, we have been researching the impact of random short-lived interruptions such as timer-decrement processing and periodic daemon activity, and developing strategies to minimize their impact on large processor-count SPMD bulk-synchronous programming styles. We present a novel co-scheduling scheme for improving performance of fine-grain collective activities such as barriers and reductions, describe an implementation consisting of operating system kernel modifications and run-time system, and present a set of empirical results comparing the technique with traditional operating system scheduling. Our results indicate a speedup of over 300% on synchronizing collectives.