Impact of sharing-based thread placement on multithreaded architectures
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Software caching and computation migration in Olden
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Studies of Windows NT performance using dynamic execution traces
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
The performance implications of locality information usage in shared-memory multiprocessors
Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
Performance evaluation of the Orca shared-object system
ACM Transactions on Computer Systems (TOCS)
Dynamic computation migration in DSM systems
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Improving Processor and Cache Locality in Fine-Grain Parallel Computations using Object-Affinity Scheduling and Continuation Passing
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Scheduling threads for constructive cache sharing on CMPs
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Performance scalability of a multi-core web server
Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems
Locating cache performance bottlenecks using data profiling
Proceedings of the 5th European conference on Computer systems
Data marshaling for multi-core architectures
Proceedings of the 37th annual international symposium on Computer architecture
Building extensible networks with rule-based forwarding
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
FACT: a framework for adaptive contention-aware thread migrations
Proceedings of the 8th ACM International Conference on Computing Frontiers
Region scheduling: efficiently using the cache architectures via page-level affinity
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Affinity-aware DMA buffer management for reducing off-chip memory access
Proceedings of the 27th Annual ACM Symposium on Applied Computing
ADAPT: A framework for coscheduling multithreaded programs
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Fmeter: extracting indexable low-level system signatures by counting kernel function calls
Proceedings of the 13th International Middleware Conference
Model-based cache-aware dispatching of object-oriented software for multicore systems
Journal of Systems and Software
On modeling contention for shared caches in multi-core processors with techniques from ecology
Natural Computing: an international journal
Hi-index | 0.00 |
High performance on multicore processors requires that schedulers be reinvented. Traditional schedulers focus on keeping execution units busy by assigning each core a thread to run. Schedulers ought to focus, however, on high utilization of on-chip memory, rather than of execution cores, to reduce the impact of expensive DRAM and remote cache accesses. A challenge in achieving good use of on-chip memory is that the memory is split up among the cores in the form of many small caches. This paper argues for a form of scheduling that assigns each object and its operations to a specific core, moving a thread among the cores as it uses different objects.