Guided self-scheduling: A practical scheduling scheme for parallel supercomputers
IEEE Transactions on Computers
PRESTO: a system for object-oriented parallel programming
Software—Practice & Experience
MASA: a multithreaded processor architecture for parallel symbolic computing
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Memory-reference characteristics of multiprocessor applications under MACH
SIGMETRICS '88 Proceedings of the 1988 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The effect of sharing on the cache and bus performance of parallel programs
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Analysis of multithreaded architectures for parallel computing
SPAA '90 Proceedings of the second annual ACM symposium on Parallel algorithms and architectures
Techniques for efficient inline tracing on a shared-memory multiprocessor
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
LimitLESS directories: A scalable cache coherence scheme
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
Using processor affinity in loop scheduling on shared-memory multiprocessors
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Working sets, cache sizes, and node granularity issues for large-scale multiprocessors
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
ICS '90 Proceedings of the 4th international conference on Supercomputing
APRIL: a processor architecture for multiprocessing
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Adaptive software cache management for distributed shared memory architectures
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Performance Tradeoffs in Multithreaded Processors
IEEE Transactions on Parallel and Distributed Systems
Conservative parallel discrete event simulation: principles and practice
Conservative parallel discrete event simulation: principles and practice
The effectiveness of multiple hardware contexts
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Increasing superscalar performance through multistreaming
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
The Named-State Register File: Implementation and Performance
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Multitasking and Multithreading on a Multiprocessor with Virtual Shared Memory
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Thread scheduling for multi-core platforms
HOTOS'07 Proceedings of the 11th USENIX workshop on Hot topics in operating systems
Sharing-aware OS scheduling algorithms for multi-socket multi-core servers
IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Reinventing scheduling for multicore systems
HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems
FACT: a framework for adaptive contention-aware thread migrations
Proceedings of the 8th ACM International Conference on Computing Frontiers
Matching memory access patterns and data placement for NUMA systems
Proceedings of the Tenth International Symposium on Code Generation and Optimization
A template library to integrate thread scheduling and locality management for NUMA multiprocessors
HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
Survey of scheduling techniques for addressing shared resources in multicore processors
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
Multithreaded architectures context switch between instruction streams to hide memory access latency. Although this improves processor utilization, it can increase cache interference and degrade overall performance. One technique to reduce the interconnect traffic is to co-locate threads that share data on the same processor. Multiple threads sharing in the cache should reduce compulsory and invalidation misses, thereby improving execution time.To test this hypothesis, we compared a variety of thread placement algorithms via trace-driven simulation of fourteen coarse- and medium-grain parallel applications on several multithreaded architectures. Our results contradict the hypothesis. Rather than decreasing, compulsory and invalidation misses remained nearly constant across all placement algorithms, for all processor configurations, even with an infinite cache. That is, sharing-based placement had no (positive) effect on execution time. Instead, load balancing was the critical factor that affected performance. Our results were due to one or both of the following reasons: (1) the sequential and uniform access of shared data by the application's threads and (2) the insignificant number of data references that require interconnect access, relative to the total number of instructions.