Symbiotic jobscheduling for a simultaneous multithreaded processor
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
OS-Controlled Cache Predictability for Real-Time Systems
RTAS '97 Proceedings of the 3rd IEEE Real-Time Technology and Applications Symposium (RTAS '97)
A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Memory Bandwidth Aware Scheduling for SMP Cluster Nodes
PDP '05 Proceedings of the 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
StatCache: a probabilistic approach to efficient and accurate data locality analysis
ISPASS '04 Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Effective Management of DRAM Bandwidth in Multicore Processors
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Analysis and approximation of optimal co-scheduling on chip multiprocessors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Towards practical page coloring-based multicore cache management
Proceedings of the 4th ACM European conference on Computer systems
HASS: a scheduler for heterogeneous multicore systems
ACM SIGOPS Operating Systems Review
Rate-based QoS techniques for cache/memory in CMP platforms
Proceedings of the 23rd international conference on Supercomputing
vGreen: a system for energy efficient computing in virtualized environments
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Managing contention for shared resources on multicore processors
Communications of the ACM
Managing Contention for Shared Resources on Multicore Processors
Queue - Power Management
Q-clouds: managing performance interference effects for QoS-aware clouds
Proceedings of the 5th European conference on Computer systems
AKULA: a toolset for experimenting and developing thread placement algorithms on multicore systems
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
A case for NUMA-aware contention management on multicore systems
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Journal of Parallel and Distributed Computing
Online cache modeling for commodity multicore processors
ACM SIGOPS Operating Systems Review
Building extensible networks with rule-based forwarding
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Memory Latency Reduction via Thread Throttling
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
All-window profiling and composable models of cache sharing
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
ULCC: a user-level facility for optimizing shared cache performance on multicores
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Directly characterizing cross core interference through contention synthesis
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
In search for contention-descriptive metrics in HPC cluster environment
Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering
Proceedings of the sixth conference on Computer systems
Proceedings of the international symposium on Memory management
Dynamic adaptive scheduling for virtual machines
Proceedings of the 20th international symposium on High performance distributed computing
Enhancement of Xen's scheduler for MapReduce workloads
Proceedings of the 20th international symposium on High performance distributed computing
The impact of memory subsystem resource sharing on datacenter applications
Proceedings of the 38th annual international symposium on Computer architecture
DBAR: an efficient routing algorithm to support multiple concurrent applications in networks-on-chip
Proceedings of the 38th annual international symposium on Computer architecture
Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Pervasive parallelism for managed runtimes
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
A case for NUMA-aware contention management on multicore systems
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
FACT: a framework for adaptive contention-aware thread migrations
Proceedings of the 8th ACM International Conference on Computing Frontiers
Scheduling javasymphony applications on many-core parallel computers
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Proceedings of the 2nd ACM Symposium on Cloud Computing
Agent based load balancing scheme using affinity processor scheduling for multicore architectures
WSEAS Transactions on Computers
A cluster computer performance predictor for memory scheduling
ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part II
Thread Tranquilizer: Dynamically reducing performance variation
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
CRUISE: cache replacement and utility-aware scheduling
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
DejaVu: accelerating resource allocation in virtualized environments
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Region scheduling: efficiently using the cache architectures via page-level affinity
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
REEact: a customizable virtual execution manager for multicore platforms
VEE '12 Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments
Bubble-Up: increasing utilization in modern warehouse scale computers via sensible co-locations
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Reducing memory interference in multicore systems via application-aware memory channel partitioning
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
ACM Transactions on Computer Systems (TOCS)
Machine learning based performance prediction for multi-core simulation
MIWAI'11 Proceedings of the 5th international conference on Multi-Disciplinary Trends in Artificial Intelligence
Share memory aware scheduler: balancing performance and fairness
Proceedings of the great lakes symposium on VLSI
Reuse distance based performance modeling and workload mapping
Proceedings of the 9th conference on Computing Frontiers
Toward predictable performance in software packet-processing platforms
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Courteous cache sharing: being nice to others in capacity management
Proceedings of the 49th Annual Design Automation Conference
Providing fairness on shared-memory multiprocessors via process scheduling
Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
Compiling for niceness: mitigating contention for QoS in warehouse scale computers
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Achieving application-centric performance targets via consolidation on multicores: myth or reality?
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Cache Conscious Task Regrouping on Multicore Processors
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Nonuniform memory affinity strategy in multithreaded sparse matrix computations
Proceedings of the 2012 Symposium on High Performance Computing
Dynamic virtual machine scheduling in clouds for architectural shared resources
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Reducing last level cache pollution in NUMA multicore systems for improving cache performance
ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part III
Scalability-based manycore partitioning
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Efficient techniques for predicting cache sharing and throughput
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
A software memory partition approach for eliminating bank-level interference in multicore systems
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Survey of scheduling techniques for addressing shared resources in multicore processors
ACM Computing Surveys (CSUR)
When less is more (LIMO):controlled parallelism forimproved efficiency
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Critical path-based thread placement for NUMA systems
ACM SIGMETRICS Performance Evaluation Review
Resource-freeing attacks: improve your cloud performance (at your neighbor's expense)
Proceedings of the 2012 ACM conference on Computer and communications security
Exploiting inter-sequence correlations for program behavior prediction
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
ADAPT: A framework for coscheduling multithreaded programs
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
The Impact of Hypervisor Layer on Database Applications
UCC '12 Proceedings of the 2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing
Providing performance guarantees in multipass network processors
IEEE/ACM Transactions on Networking (TON)
Moths: Mobile threads for on-chip networks
ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
HOTL: a higher order theory of locality
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
To hardware prefetch or not to prefetch?: a virtualized environment study and core binding approach
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Traffic management: a holistic approach to memory placement on NUMA systems
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Cache-Conscious Wavefront Scheduling
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Interference and locality-aware task scheduling for MapReduce applications in virtual clusters
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
COSMIC: middleware for high performance and reliable multiprocessing on xeon phi coprocessors
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Redesigning MPI shared memory communication for large multi-core architecture
Computer Science - Research and Development
Analyzing resource interdependencies in multi-core architectures to improve scheduling decisions
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Scheduling optimization in multicore multithreaded microprocessors through dynamic modeling
Proceedings of the ACM International Conference on Computing Frontiers
Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers
Proceedings of the 40th Annual International Symposium on Computer Architecture
Interference resilient PDES on multi-core systems: towards proportional slowdown
Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation
Reducing contention through priority updates
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Characterization and modeling of PIDX parallel I/O for performance optimization
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Coordinated power-performance optimization in manycores
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
L1-bandwidth aware thread allocation in multicore SMT processors
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
An empirical model for predicting cross-core performance interference on multicore processors
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
ACM Transactions on Architecture and Code Optimization (TACO)
On modeling contention for shared caches in multi-core processors with techniques from ecology
Natural Computing: an international journal
Towards software performance engineering for multicore and manycore systems
ACM SIGMETRICS Performance Evaluation Review
ACM Transactions on Architecture and Code Optimization (TACO)
Exploiting multi-core nodes in peer-to-peer grids
Journal of Parallel and Distributed Computing
Virtual machine consolidation based on interference modeling
The Journal of Supercomputing
Hi-index | 0.02 |
Contention for shared resources on multicore processors remains an unsolved problem in existing systems despite significant research efforts dedicated to this problem in the past. Previous solutions focused primarily on hardware techniques and software page coloring to mitigate this problem. Our goal is to investigate how and to what extent contention for shared resource can be mitigated via thread scheduling. Scheduling is an attractive tool, because it does not require extra hardware and is relatively easy to integrate into the system. Our study is the first to provide a comprehensive analysis of contention-mitigating techniques that use only scheduling. The most difficult part of the problem is to find a classification scheme for threads, which would determine how they affect each other when competing for shared resources. We provide a comprehensive analysis of such classification schemes using a newly proposed methodology that enables to evaluate these schemes separately from the scheduling algorithm itself and to compare them to the optimal. As a result of this analysis we discovered a classification scheme that addresses not only contention for cache space, but contention for other shared resources, such as the memory controller, memory bus and prefetching hardware. To show the applicability of our analysis we design a new scheduling algorithm, which we prototype at user level, and demonstrate that it performs within 2\% of the optimal. We also conclude that the highest impact of contention-aware scheduling techniques is not in improving performance of a workload as a whole but in improving quality of service or performance isolation for individual applications.