Impact of Memory Contention on Dynamic Scheduling on NUMA Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
WWC '98 Proceedings of the Workload Characterization: Methodology and Case Studies
Using Hardware Counters to Automatically Improve Memory Performance
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Multiprocessor Energy-Efficient Scheduling for Real-Time Tasks with Different Power Characteristics
ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
Enhancements for hyper-threading technology in the operating system: seeking the optimal scheduling
WIESS'02 Proceedings of the 2nd conference on Industrial Experiences with Systems Software - Volume 2
What can performance counters do for memory subsystem analysis?
Proceedings of the 2008 ACM SIGPLAN workshop on Memory systems performance and correctness: held in conjunction with the Thirteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '08)
Real time power estimation and thread scheduling via performance counters
ACM SIGARCH Computer Architecture News
The Art of Multiprocessor Programming
The Art of Multiprocessor Programming
Decomposable and responsive power models for multicore processors using performance counters
Proceedings of the 24th ACM International Conference on Supercomputing
IEEE Spectrum
SoftPower: fine-grain power estimations using performance counters
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Contention-Aware Scheduling on Multicore Systems
ACM Transactions on Computer Systems (TOCS)
Power efficient scheduling for hard real-time systems on a multiprocessor platform
NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Online cache modeling for commodity multicore processors
ACM SIGOPS Operating Systems Review
Performance profiling of virtual machines
Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Demand-driven software race detection using hardware performance counters
Proceedings of the 38th annual international symposium on Computer architecture
Rapid identification of architectural bottlenecks via precise event counting
Proceedings of the 38th annual international symposium on Computer architecture
A case for NUMA-aware contention management on multicore systems
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Are hardware performance counters a cost effective way for integrity checking of programs
Proceedings of the sixth ACM workshop on Scalable trusted computing
Power efficient rate monotonic scheduling for multi-core systems
Journal of Parallel and Distributed Computing
Critical path-based thread placement for NUMA systems
Proceedings of the second international workshop on Performance modeling, benchmarking and simulation of high performance computing systems
Overseer: low-level hardware monitoring and management for Java
Proceedings of the 9th International Conference on Principles and Practice of Programming in Java
Fine-grained per-core frequency scheduling for power efficient-multicore execution
IGCC '11 Proceedings of the 2011 International Green Computing Conference and Workshops
Matching memory access patterns and data placement for NUMA systems
Proceedings of the Tenth International Symposium on Code Generation and Optimization
A template library to integrate thread scheduling and locality management for NUMA multiprocessors
HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
Libmonitor: A tool for first-party monitoring
Parallel Computing
Characterizing thread placement in the IBM POWER7 processor
IISWC '12 Proceedings of the 2012 IEEE International Symposium on Workload Characterization (IISWC)
Hardware-aware Thread Scheduling: The Case of Asymmetric Multicore Processors
ICPADS '12 Proceedings of the 2012 IEEE 18th International Conference on Parallel and Distributed Systems
Hi-index | 0.00 |
Modern processor architectures are increasingly complex and heterogeneous, often requiring software solutions tailored to the specific hardware characteristics of each processor model. In this article, we address this problem by targeting two processors featuring Simultaneous MultiThreading (SMT) to improve the occupancy of their internal execution units through a sustained stream of instructions coming from more than one thread. We target the AMD Bulldozer and IBM POWER7 processors as case studies for specific hardware-oriented performance optimizations that increase the variety of instructions sent to each core to maximize the occupancy of all its execution units. WorkOver, presented in this article, improves thread scheduling by increasing the performance of floating point-intensive workloads on Linux-based operating systems. WorkOver is a user-space monitoring tool that automatically identifies FPU-intensive threads and schedules them in a more efficient way without requiring any patches or modifications at the kernel level. Our measurements using standard benchmark suites show that speedups of up to 20% can be achieved by simply allowing WorkOver to monitor applications and schedule their threads, without any modification of the workload.