Global register allocation at link time
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Architecture of a message-driven processor
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
ACM Transactions on Computer Systems (TOCS)
Cache performance of operating system and multiprogramming workloads
ACM Transactions on Computer Systems (TOCS)
Toward a dataflow/von Neumann hybrid architecture
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Multiprocessor cache analysis using ATUM
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
MASA: a multithreaded processor architecture for parallel symbolic computing
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A simple interprocedural register allocation algorithm and its effectiveness for LISP
ACM Transactions on Programming Languages and Systems (TOPLAS)
A processor architecture for horizon
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
ACM Transactions on Computer Systems (TOCS)
The Stack Growth Function: Cache Line Reference Models
IEEE Transactions on Computers
IEEE Transactions on Computers
Can dataflow subsume von Neumann computing?
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Interconnection networks for large-scale parallel processing: theory and case studies (2nd ed.)
Interconnection networks for large-scale parallel processing: theory and case studies (2nd ed.)
Analysis of multithreaded architectures for parallel computing
SPAA '90 Proceedings of the second annual ACM symposium on Parallel algorithms and architectures
LimitLESS directories: A scalable cache coherence scheme
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Closing the window of vulnerability in multiphase memory transactions
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
APRIL: a processor architecture for multiprocessing
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
ACM Computing Surveys (CSUR)
Postpass Code Optimization of Pipeline Constraints
ACM Transactions on Programming Languages and Systems (TOPLAS)
Dynamic space-sharing in computer systems
Communications of the ACM
The working set model for program behavior
Communications of the ACM
Limits on Interconnection Network Performance
IEEE Transactions on Parallel and Distributed Systems
A multiple stream microprocessor prototype system: AMP-1
ISCA '80 Proceedings of the 7th annual symposium on Computer Architecture
A large scale, homogeneous, fully distributed parallel machine, I
ISCA '77 Proceedings of the 4th annual symposium on Computer architecture
PERFORMANCE TRADEOFFS IN MULTITHREADED PROCESSORS
PERFORMANCE TRADEOFFS IN MULTITHREADED PROCESSORS
Reduced instruction set computer architectures for vlsi (microprocessor, risc, multiple-windows - of - registers)
Register relocation: flexible contexts for multithreading
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
A case for the multithreaded processor architecture
ACM SIGARCH Computer Architecture News - Special issue on input/output in parallel computer systems
Impact of sharing-based thread placement on multithreaded architectures
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Interleaving: a multithreading technique targeting multiprocessors and workstations
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The effectiveness of multiple hardware contexts
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Future applicability of bus-based shared memory multiprocessors
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Data forwarding in scalable shared-memory multiprocessors
ICS '95 Proceedings of the 9th international conference on Supercomputing
Limits on the performance benefits of multithreading and prefetching
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Data Forwarding in Scalable Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading
ACM Transactions on Computer Systems (TOCS)
Simultaneous multithreading: maximizing on-chip parallelism
25 years of the international symposia on Computer architecture (selected papers)
Memory sharing predictor: the key to a speculative coherent DSM
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Asynchrony in parallel computing: from dataflow to multithreading
Progress in computer research
Asynchrony in parallel computing: from dataflow to multithreading
Progress in computer research
A Simulation Study of Decoupled Vector Architectures
The Journal of Supercomputing
Performance Evaluation of Hierarchical Ring-Based Shared Memory Multiprocessors
IEEE Transactions on Computers
Practical Delay Enforced Multistream (DEMUS) Control of Deeply Pipelined Processors
IEEE Transactions on Computers
Performance Analysis of Four Memory Consistency Models for Multithreaded Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Analysis of performance bottlenecks in multithreaded multiprocessor systems
Fundamenta Informaticae - Application of concurrency to system design
Latency Tolerance: A Metric for Performance Analysis of Multithreaded Architectures
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
A Queuing Model of a Multi-threaded Architecture: A Case Study
PaCT '999 Proceedings of the 5th International Conference on Parallel Computing Technologies
Analytic Performance Modeling for a Spectrum of Multithreaded Processor Architectures
MASCOTS '95 Proceedings of the 3rd International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems
Computer Networks: The International Journal of Computer and Telecommunications Networking - Network processors
Thread prioritization: a thread scheduling mechanism for multiple-context parallel processors
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Decoupled vector architectures
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Performance Study of a Multithreaded Superscalar Microprocessor
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Performance Impacts of Caching I-Structure Data on Frame-Based Multithreaded Processing
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Performance Models for Network Processor Design
IEEE Transactions on Parallel and Distributed Systems
Compiler optimization to improve data locality for processor multithreading
Scientific Programming
Performance/area efficiency in chip multiprocessors with micro-caches
Proceedings of the 4th international conference on Computing frontiers
A dynamically reconfigurable cache for multithreaded processors
Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
Analytic modeling of network processors for parallel workload mapping
ACM Transactions on Embedded Computing Systems (TECS)
International Journal of High Performance Computing Applications
International Journal of Modelling and Simulation
Performance limitations of block-multithreaded distributed-memory systems
Winter Simulation Conference
Performance and power aware CMP thread allocation modeling
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Analysis of Performance Bottlenecks in Multithreaded Multiprocessor Systems
Fundamenta Informaticae - Application of Concurrency to System Design
Prototyping hardware support for irregular applications
Proceedings of the 2013 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools
On modeling contention for shared caches in multi-core processors with techniques from ecology
Natural Computing: an international journal
Hi-index | 0.01 |
An analytical performance model for multithreaded processors that includes cache interference, network contention, context-switching overhead, and data-sharing effects is presented. The model is validated through the author's simulations and by comparison with previously published simulation results. The results indicate that processors can substantially benefit from multithreading, even in systems with small caches, provided sufficient network bandwidth exists. Caches that are much larger than the working-set sizes of individual processes yield close to full processor utilization with as few as two to four contexts. Smaller caches require more contexts to keep the processor busy, while caches that are comparable in size to the working-sets of individual processes cannot achieve a high utilization regardless of the number of contexts. Increased network contention due to multithreading has a major effect on performance. The available network bandwidth and the context-switching overhead limits the best possible utilization.