Thread Tranquilizer: Dynamically reducing performance variation

Authors:
Kishore Kumar Pusukuri;Rajiv Gupta;Laxmi N. Bhuyan
Affiliations:
University of California, Riverside, CA;University of California, Riverside, CA;University of California, Riverside, CA
Venue:
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Year:
2012

Citing 36
Cited 3

The impact of operating system scheduling policies and synchronization methods of performance of parallel applications

SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Operating system support for improving data locality on CC-NUMA compute servers

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Memory Conscious Scheduling for Cluster-based NUMA Multiprocessors

The Journal of Supercomputing
Hoard: a scalable memory allocator for multithreaded applications

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Reducing the variance of point to point transfers in the IBM 9076 parallel computer

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Variability in Architectural Simulations of Multi-Threaded Workloads

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Exploring the Relationship Between Parallel Application Run-Time Variability and Network Performance in Clusters

LCN '03 Proceedings of the 28th Annual IEEE International Conference on Local Computer Networks
The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Solaris Internals (2nd Edition)

Solaris Internals (2nd Edition)
System noise, OS clock ticks, and fine-grained parallel applications

Proceedings of the 19th annual international conference on Supercomputing
Performance implications of single thread migration on a chip multi-core

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Parallel Parameter Tuning for Applications with Performance Variability

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Dynamic thread assignment on heterogeneous multiprocessor architectures

Proceedings of the 3rd conference on Computing frontiers
Solaris(TM) Performance and Tools: DTrace and MDB Techniques for Solaris 10 and OpenSolaris (Solaris Series)

Solaris(TM) Performance and Tools: DTrace and MDB Techniques for Solaris 10 and OpenSolaris (Solaris Series)
Dynamic instrumentation of production systems

ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Benchmarking the effects of operating system interference on extreme-scale parallel machines

Cluster Computing
The ghost in the machine: observing the effects of kernel operation on parallel application performance

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Characterizing application sensitivity to OS interference using kernel-level noise injection

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Identifying sources of Operating System Jitter through fine-grained kernel instrumentation

CLUSTER '07 Proceedings of the 2007 IEEE International Conference on Cluster Computing
MapReduce optimization using regulated dynamic prioritization

Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Handling OS jitter on multicore multithreaded systems

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Reducing performance non-determinism via cache-aware page allocation strategies

Proceedings of the first joint WOSP/SIPEW international conference on Performance engineering
Request behavior variations

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Addressing shared resource contention in multicore processors via scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Resource-conscious scheduling for energy efficiency on multicore processors

Proceedings of the 5th European conference on Computer systems
Performance variability of highly parallel architectures

ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
Measuring and Understanding Variation in Benchmark Performance

HPCMP-UGC '09 Proceedings of the 2009 DoD High Performance Computing Modernization Program Users Group Conference
vGreen: A System for Energy-Efficient Management of Virtual Machines

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Managing Variability in the IO Performance of Petascale Storage Systems

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Designing OS for HPC Applications: Scheduling

CLUSTER '10 Proceedings of the 2010 IEEE International Conference on Cluster Computing
An analysis of Linux scalability to many cores

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
A case for NUMA-aware contention management on multicore systems

USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
FACT: a framework for adaptive contention-aware thread migrations

Proceedings of the 8th ACM International Conference on Computing Frontiers
Data sharing conscious scheduling for multi-threaded applications on SMP machines

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Thread reinforcer: Dynamically determining number of threads via OS level monitoring

IISWC '11 Proceedings of the 2011 IEEE International Symposium on Workload Characterization

ADAPT: A framework for coscheduling multithreaded programs

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Regularities considered harmful: forcing randomness to memory accesses to reduce row buffer conflicts for multi-core, multi-bank systems

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
An efficient and comprehensive scheduler on Asymmetric Multicore Architecture systems

Journal of Systems Architecture: the EUROMICRO Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

To realize the performance potential of multicore systems, we must effectively manage the interactions between memory reference behavior and the operating system policies for thread scheduling and migration decisions. We observe that these interactions lead to significant variations in the performance of a given application, from one execution to the next, even when the program input remains unchanged and no other applications are being run on the system. Our experiments with multithreaded programs, including the TATP database application, SPECjbb2005, and a subset of PARSEC and SPEC OMP programs, on a 24-core Dell PowerEdge R905 server running OpenSolaris confirms the above observation. In this work we develop Thread Tranquilizer, an automatic technique for simultaneously reducing performance variation and improving performance by dynamically choosing appropriate memory allocation and process scheduling policies. Thread Tranquilizer uses simple utilities available on modern Operating Systems for monitoring cache misses and thread context-switches and then utilizes the collected information to dynamically select appropriate memory allocation and scheduling policies. In our experiments, Thread Tranquilizer yields up to 98% (average 68%) reduction in performance variation and up to 43% (average 15%) improvement in performance over default policies of OpenSolaris. We also demonstrate that Thread Tranquilizer simultaneously reduces performance variation and improves performance of the programs on Linux. Thread Tranquilizer is easy to use as it does not require any changes to the application source code or the OS kernel.