Petri nets: an introduction
An architecture for software-controlled data prefetching
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Improved multithreading techniques for hiding communication latency in multiprocessors
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Software support for speculative loads
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
A performance study of software and hardware data prefetching schemes
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Multithreaded processor architectures
IEEE Spectrum
Proceedings of the 27th annual international symposium on Computer architecture
Performance Tradeoffs in Multithreaded Processors
IEEE Transactions on Parallel and Distributed Systems
Analysis of performance bottlenecks in multithreaded multiprocessor systems
Fundamenta Informaticae - Application of concurrency to system design
A survey of processors with explicit multithreading
ACM Computing Surveys (CSUR)
Design and performance evaluation of a multithreaded architecture
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Timed Petri net models of multithreaded multiprocessor architectures
PNPM '97 Proceedings of the 6th International Workshop on Petri Nets and Performance Models
High-Performance Throughput Computing
IEEE Micro
Hi-index | 0.00 |
The performance of modern computer systems is increasingly often limited by long latencies of accesses to the memory subsystems. Instruction-level multithreading is an architectural approach to tolerating such long latencies by switching instruction threads rather than waiting for the completion of memory operations. The paper studies performance limitations in distributed-memory block multithreaded systems and determines conditions for such systems to be balanced. Event-driven simulation of a timed Petri net model of a simple distributed-memory system confirms the derived performance results.