ACM Transactions on Computer Systems (TOCS)
Portable programs for parallel processors
Portable programs for parallel processors
ACM Transactions on Computer Systems (TOCS)
The performance implications of thread management alternatives for shared-memory multiprocessors
SIGMETRICS '89 Proceedings of the 1989 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Page placement algorithms for large real-indexed caches
ACM Transactions on Computer Systems (TOCS)
Avoiding conflict misses dynamically in large direct-mapped caches
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Thread scheduling for cache locality
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The performance implications of locality information usage in shared-memory multiprocessors
Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
Modeling cost/performance of a parallel computer simulator
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling
IEEE Transactions on Parallel and Distributed Systems
Efficient Extensible Synchronization in Sather
ISCOPE '97 Proceedings of the Scientific Computing in Object-Oriented Parallel Environments
Portable, modular expression of locality
Portable, modular expression of locality
Shade: A Fast Instruction Set Simulator for Execution Profiling
Shade: A Fast Instruction Set Simulator for Execution Profiling
Combinatorial Algorithms: Theory and Practice
Combinatorial Algorithms: Theory and Practice
lmbench: portable tools for performance analysis
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Scheduling threads for low space requirement and good locality
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
The data locality of work stealing
Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
The benefits of event: driven energy accounting in power-sensitive systems
EW 9 Proceedings of the 9th workshop on ACM SIGOPS European workshop: beyond the PC: new challenges for the operating system
High-performance thread migration on clusters of SMPs
Cluster computing
Scheduling Algorithms for Effective Thread Pairing on Hybrid Multiprocessors
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Performance-driven processor allocation
OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Enhancements for hyper-threading technology in the operating system: seeking the optimal scheduling
WIESS'02 Proceedings of the 2nd conference on Industrial Experiences with Systems Software - Volume 2
Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
How many threads to spawn during program multithreading?
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Work-stealing with configurable scheduling strategies
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.00 |
This paper describes a combined approach for improving thread locality that uses the bardware performance monitors of modem processors and program-centric code annotations to guide thread scheduling on SMPs. The approach relies on a shared state cache model to compute expected thread footprints in the cache on-line. The accuracy of the model has been analyzed by simmations involving a set of parallel applications. We demonstrate how the cache model can be used to implement several practical locality-based thread scheduling policies with little overhead. Active Threads, a portable, high-performance thread system, has been built and used to investigate the performance impact of locality scheduling for several applications.