ACM Transactions on Computer Systems (TOCS)
Petri nets: an introduction
Advances in Petri nets 1986, part I on Petri nets: central models and their properties
An architecture for software-controlled data prefetching
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Improved multithreading techniques for hiding communication latency in multiprocessors
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Software support for speculative loads
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
A performance study of software and hardware data prefetching schemes
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Multithreaded processor architectures
IEEE Spectrum
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Memory bandwidth limitations of future microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Computer architecture (2nd ed.): design and performance
Computer architecture (2nd ed.): design and performance
Multiple context multithreaded superscalar processor architecture
Journal of Systems Architecture: the EUROMICRO Journal
Proceedings of the 27th annual international symposium on Computer architecture
Performance Tradeoffs in Multithreaded Processors
IEEE Transactions on Parallel and Distributed Systems
Timed Petri net models of multithreaded multiprocessor architectures
PNPM '97 Proceedings of the 6th International Workshop on Petri Nets and Performance Models
The Memory Bandwidth Bottleneck and its Amelioration by a Compiler
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Performance limitations of block-multithreaded distributed-memory systems
Winter Simulation Conference
Hi-index | 0.00 |
The performance of modern multiprocessor systems is often limited by the delays of interconnections or long latencies of memory subsystems. Instruction-level multithreading is a technique to tolerate such long latencies by switching from one instruction thread to another and continuing instruction execution concurrently with the long latency operations. Using timed Petri net models, the paper analyzes performance limitations introduced by different components of distributed memory multithreaded multiprocessor systems. Simulation results are used to compare performance improvements obtained by replicating critical components of the system to those obtained using components with better performance characteristics.