Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Memory system characterization of commercial workloads
Proceedings of the 25th annual international symposium on Computer architecture
Maximizing CMP Throughput with Mediocre Cores
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
POWER4 system microarchitecture
IBM Journal of Research and Development
Hi-index | 0.00 |
Processor architecture is undergoing a significant change in response to the rapidly escalating complexities of high-power, high-frequency, and increasingly superscalar designs. Evolutionary multi-core and aggressively multi-threaded chips are appearing in the general purpose microprocessor space. The latter offer simplicity, low power, and high performance on threaded workloads but with somewhat reduced single thread performance. This paper examines the performance of the SPARC64(TM) VI, a dual-core 4-thread processor, and the UltraSPARC(TM) T1, an 8-core 32-thread processor. Numerous workloads are executed on both designs. These include single thread speed tests, homogeneous throughput tests, and multi-threaded tests using varying amounts of data and parallelism. The results indicate a clear separation in the workloads that are best suited to each design. To reap the full benefit of these multi-threaded designs, software has to be architected to use as many threads as possible. This shift is likely to affect both software developers and compiler writers for the next several years.