The coming wave of multithreaded chip multiprocessors

Authors:
James Laudon;Lawrence Spracklen
Affiliations:
Sun Microsystems, Inc., Santa Clara, CA;Sun Microsystems, Inc., Network Circle, Santa Clara, CA
Venue:
International Journal of Parallel Programming
Year:
2007

Citing 4
Cited 5

Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro
Maximizing CMP Throughput with Mediocre Cores

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Performance/Watt: the new server focus

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
A performance methodology for commercial servers

IBM Journal of Research and Development

Towards a Java multiprocessor

JTRES '07 Proceedings of the 5th international workshop on Java technologies for real-time and embedded systems
High performance dense linear algebra on a spatially distributed processor

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Making secure processors OS- and performance-friendly

ACM Transactions on Architecture and Code Optimization (TACO)
A real-time Java chip-multiprocessor

ACM Transactions on Embedded Computing Systems (TECS)
Cache Performance Optimization for Processing XML-Based Application Data on Multi-core Processors

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The performance of microprocessors has increased exponentially for over 35 years. However, process technology challenges, chip power constraints, and difficulty in extracting instruction-level parallelism are conspiring to limit the performance of future individual processors. To address these limits, the computer industry has embraced chip multiprocessing (CMP), predominately in the form of multiple high-performance superscalar processors on the same die. We explore the trade-off between building CMPs from a few high-performance cores or building CMPs from a large number of lower-performance cores and argue that CMPs built from a larger number of lower-performance cores can provide better performance and performance/Watt on many commercial workloads. We examine two multi-threaded CMPs built using a large number of processor cores: Sun's Niagara and Niagara 2 processors. We also explore the programming issues for CMPs with large number of threads. The programming model for these CMPs is similar to the widely used programming model for symmetric multiprocessors (SMPs), but the greatly reduced costs associated with communication of data through the on-chip shared secondary cache allows for more fine-grain parallelism to be effectively exploited by the CMR Finally, we present performance comparisons between Sun's Niagara and more conventional dual-core processors built from large superscalar processor cores. For several key server workloads, Niagara shows significant performance and even more significant performance/Watt advantages over the CMPs built from traditional superscalar processors.