The effectiveness of multiple hardware contexts

Authors:
Radhika Thekkath;Susan J. Eggers
Affiliations:
Department of Computer Science and Engineering, FR-35, University of Washington, Seattle, WA;Department of Computer Science and Engineering, FR-35, University of Washington, Seattle, WA
Venue:
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Year:
1994

Citing 21
Cited 28

Guided self-scheduling: A practical scheduling scheme for parallel supercomputers

IEEE Transactions on Computers
PRESTO: a system for object-oriented parallel programming

Software—Practice & Experience
MASA: a multithreaded processor architecture for parallel symbolic computing

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: preliminary results

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Analysis of multithreaded architectures for parallel computing

SPAA '90 Proceedings of the second annual ACM symposium on Parallel algorithms and architectures
Techniques for efficient inline tracing on a shared-memory multiprocessor

SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
LimitLESS directories: A scalable cache coherence scheme

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Comparative evaluation of latency reducing and tolerating techniques

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Strategies for achieving improved processor throughput

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
SPLASH: Stanford parallel applications for shared-memory

ACM SIGARCH Computer Architecture News
Improved multithreading techniques for hiding communication latency in multiprocessors

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Using processor affinity in loop scheduling on shared-memory multiprocessors

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Impact of sharing-based thread placement on multithreaded architectures

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Complexity/performance tradeoffs with non-blocking loads

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The Tera computer system

ICS '90 Proceedings of the 4th international conference on Supercomputing
APRIL: a processor architecture for multiprocessing

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Limits on Interconnection Network Performance

IEEE Transactions on Parallel and Distributed Systems
Performance Tradeoffs in Multithreaded Processors

IEEE Transactions on Parallel and Distributed Systems
Computing Per-Process Summary Side-Effect Information

Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Static Analysis of Barrier Synchronization in Explicitly Parallel Programs

PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Lockup-free instruction fetch/prefetch cache organization

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture

Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Single-program speculative multithreading (SPSM) architecture: compiler-assisted fine-grained multithreading

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Informing memory operations: providing memory performance feedback in modern processors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Limits on the performance benefits of multithreading and prefetching

Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Fine-grain multithreading with the EM-X multiprocessor

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

ACM Transactions on Computer Systems (TOCS)
Informing memory operations: memory performance feedback mechanisms and their applications

ACM Transactions on Computer Systems (TOCS)
Simultaneous multithreading: maximizing on-chip parallelism

25 years of the international symposia on Computer architecture (selected papers)
Effects of Multithreading on Cache Performance

IEEE Transactions on Computers - Special issue on cache memory and related problems
ILP versus TLP on SMT

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Scheduled Dataflow: Execution Paradigm, Architecture, and Performance Evaluation

IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Latency Tolerance: A Metric for Performance Analysis of Multithreaded Architectures

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Data locality sensitivity of multithreaded computations on a distributed-memory multiprocessor

CASCON '96 Proceedings of the 1996 conference of the Centre for Advanced Studies on Collaborative research
Thread prioritization: a thread scheduling mechanism for multiple-context parallel processors

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Efficient Real-Time Fine-Grained Concurrency on Low-Cost Microcontrollers

IEEE Micro
Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading Hierarchy

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Optimizing NANOS OpenMP for the IBM Cyclops Multithreaded Architecture

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Mambo: a full system simulator for the PowerPC architecture

ACM SIGMETRICS Performance Evaluation Review - Special issue on tools for computer architecture research
Software thread integration for embedded system display applications

ACM Transactions on Embedded Computing Systems (TECS)
Fairness and Throughput in Switch on Event Multithreading

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Fairness enforcement in switch on event multithreading

ACM Transactions on Architecture and Code Optimization (TACO)
Exploiting multilevel parallelism using OpenMP on a massive multithreaded architecture

Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
Enhancing Microkernel Performance on VLIW DSP Processors via Multiset Context Switch

Journal of Signal Processing Systems
Memory-level parallelism aware fetch policies for simultaneous multithreading processors

ACM Transactions on Architecture and Code Optimization (TACO)
Service level agreement for multithreaded processors

ACM Transactions on Architecture and Code Optimization (TACO)
Improving SMT performance scheduling processes

EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
A processor extension for cycle-accurate real-time software

EUC'06 Proceedings of the 2006 international conference on Embedded and Ubiquitous Computing
Multi-criteria checkpointing strategies: response-time versus resource utilization

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multithreaded processors are used to tolerate long memory latencies. By executing threads loaded in multiple hardware contexts, an otherwise idle processor can keep busy, thus increasing its utilization. However, the larger size of a multi-thread working set can have a negative effect on cache conflict misses. In this paper we evaluate the two phenomena together, examining their combined effect on execution time.The usefulness of multiple hardware contexts depends on: program data locality, cache organization and degree of multiprocessing. Multiple hardware contexts are most effective on programs that have been optimized for data locality. For these programs, execution time dropped with increasing contexts, over widely varying architectures. With unoptimized applications, multiple contexts had limited value. The best performance was seen with only two contexts, and only on uniprocessors and small multiprocessors. The behavior of the unoptimized applications changed more noticeably with variations in cache associativity and cache hierarchy, unlike the optimized programs.As a mechanism for exploiting program parallelism, an additional processor is clearly better than another context. However, there were many configurations for which the addition of a few hardware contexts brought as much or greater performance than a larger multiprocessor with fewer than the optimal number of contexts.