Architecture of a message-driven processor
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Measurement and evaluation of the MIPS architecture and processor
ACM Transactions on Computer Systems (TOCS)
Toward a dataflow/von Neumann hybrid architecture
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
An evaluation of directory schemes for cache coherence
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
MASA: a multithreaded processor architecture for parallel symbolic computing
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
The parallel decomposition and implementation of an integrated circuit global router
PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
The architecture and programming of the Ametek series 2010 multicomputer
C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
Parallel implementation of OPS5 on the encore multiprocessor: results and analysis
International Journal of Parallel Programming
Characterization of parallelism and deadlocks in distributed digital logic simulation
DAC '89 Proceedings of the 26th ACM/IEEE Design Automation Conference
Reduced instruction set computers
Communications of the ACM - Special section on computer architecture
LocusRoute: a parallel global router for standard cells
DAC '88 Proceedings of the 25th ACM/IEEE Design Automation Conference
A critique of multiprocessing von Neumann style
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
A critique of multiprocessing von Neumann style
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Analysis of multithreaded architectures for parallel computing
SPAA '90 Proceedings of the second annual ACM symposium on Parallel algorithms and architectures
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Comparative evaluation of latency reducing and tolerating techniques
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Multithreading: a revisionist view of dataflow architectures
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Hiding memory latency using dynamic scheduling in shared-memory multiprocessors
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
An elementary processor architecture with simultaneous instruction issuing from multiple threads
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Processor coupling: integrating compile time and runtime scheduling for parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Improved multithreading techniques for hiding communication latency in multiprocessors
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Reducing memory latency via non-blocking and prefetching caches
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Closing the window of vulnerability in multiphase memory transactions
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Register relocation: flexible contexts for multithreading
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Empirical study of latency hiding on a fine-grain parallel processor
ICS '93 Proceedings of the 7th international conference on Supercomputing
A survey of PRAM simulation techniques
ACM Computing Surveys (CSUR)
A case for the multithreaded processor architecture
ACM SIGARCH Computer Architecture News - Special issue on input/output in parallel computer systems
Impact of sharing-based thread placement on multithreaded architectures
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Interleaving: a multithreading technique targeting multiprocessors and workstations
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The effectiveness of multiple hardware contexts
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Architectural mechanisms for explicit communication in shared memory multiprocessors
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Increasing superscalar performance through multistreaming
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Analysis of communications and overhead reduction in multithreaded execution
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Proceedings of the 28th annual international symposium on Microarchitecture
Evaluation of multithreaded uniprocessors for commercial application environments
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading
ACM Transactions on Computer Systems (TOCS)
Monsoon: an explicit token-store architecture
25 years of the international symposia on Computer architecture (selected papers)
Simultaneous multithreading: maximizing on-chip parallelism
25 years of the international symposia on Computer architecture (selected papers)
Concurrent Event Handling through Multithreading
IEEE Transactions on Computers
Monsoon: an explicit token-store architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
APRIL: a processor architecture for multiprocessing
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
PLUS: a distributed shared-memory system
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Cache Memories for Dataflow Systems
IEEE Parallel & Distributed Technology: Systems & Technology
Performance Tradeoffs in Multithreaded Processors
IEEE Transactions on Parallel and Distributed Systems
Performance Analysis of Mesh Interconnection Networks with Deterministic Routing
IEEE Transactions on Parallel and Distributed Systems
Performance Analysis of Four Memory Consistency Models for Multithreaded Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Latency Tolerance: A Metric for Performance Analysis of Multithreaded Architectures
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Analytic Performance Modeling for a Spectrum of Multithreaded Processor Architectures
MASCOTS '95 Proceedings of the 3rd International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems
Data locality sensitivity of multithreaded computations on a distributed-memory multiprocessor
CASCON '96 Proceedings of the 1996 conference of the Centre for Advanced Studies on Collaborative research
Thread prioritization: a thread scheduling mechanism for multiple-context parallel processors
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
The Named-State Register File: Implementation and Performance
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Multitasking and Multithreading on a Multiprocessor with Virtual Shared Memory
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Measurement and Modeling of EARTH-MANNA Multithreaded Architecture
MASCOTS '96 Proceedings of the 4th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Timed Petri net models of multithreaded multiprocessor architectures
PNPM '97 Proceedings of the 6th International Workshop on Petri Nets and Performance Models
Enhancing Microkernel Performance on VLIW DSP Processors via Multiset Context Switch
Journal of Signal Processing Systems
Hybrid multithreading for VLIW processors
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Hi-index | 0.00 |
A fundamental problem that any scalable multiprocessor must address is the ability to tolerate high latency memory operations. This paper explores the extent to which multiple hardware contexts per processor can help to mitigate the negative effects of high latency. In particular, we evaluate the performance of a directory-based cache coherent multiprocessor using memory reference traces obtained from three parallel applications. We explore the case where there are a small fixed number (2-4) of hardware contexts per processor and the context switch overhead is low. In contrast to previously proposed approaches, we also use a very simple context switch criterion, namely a cache miss or a write-hit to shared data. Our results show that the effectiveness of multiple contexts depends on the nature of the applications, the context switch overhead, and the inherent latency of the machine architecture. Given reasonably low overhead hardware context switches, we show that two or four contexts can achieve substantial performance gains over a single context. For one application, the processor utilization increased by about 46% with two contexts and by about 80% with four contexts.