Performance/Watt: the new server focus

Authors:
James Laudon
Affiliations:
Sun Microsystems
Venue:
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Year:
2005

Citing 17
Cited 12

Interleaving: a multithreading technique targeting multiprocessors and workstations

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Improving data cache performance by pre-executing instructions under a cache miss

ICS '97 Proceedings of the 11th international conference on Supercomputing
Memory system characterization of commercial workloads

Proceedings of the 25th annual international symposium on Computer architecture
An analysis of database workload performance on simultaneous multithreaded processors

Proceedings of the 25th annual international symposium on Computer architecture
Piranha: a scalable architecture based on single-chip multiprocessing

Proceedings of the 27th annual international symposium on Computer architecture
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Transient-fault recovery using simultaneous multithreading

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Detailed design and evaluation of redundant multithreading alternatives

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors

IEEE Micro
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Design and implementation of the POWER5™ microprocessor

Proceedings of the 41st annual Design Automation Conference
Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro
High-Performance Throughput Computing

IEEE Micro
Maximizing CMP Throughput with Mediocre Cores

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
The Price of Performance

Queue - Multiprocessors
A performance methodology for commercial servers

IBM Journal of Research and Development

Area-Performance Trade-offs in Tiled Dataflow Architectures

Proceedings of the 33rd annual international symposium on Computer Architecture
PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
JouleSort: a balanced energy-efficiency benchmark

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
QoS policies and architecture for cache/memory in CMP platforms

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Exploring Large-Scale CMP Architectures Using ManySim

IEEE Micro
The coming wave of multithreaded chip multiprocessors

International Journal of Parallel Programming
PicoServer: Using 3D stacking technology to build energy efficient servers

ACM Journal on Emerging Technologies in Computing Systems (JETC)
Dynamic heterogeneity and the need for multicore virtualization

ACM SIGOPS Operating Systems Review
Platform-aware bottleneck detection for reconfigurable computing applications

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
From the origins of performance evaluation to new green ICT performance engineering

PERFORM'10 Proceedings of the 2010 IFIP WG 6.3/7.3 international conference on Performance Evaluation of Computer and Communication Systems: milestones and future challenges
Carbon nanotube circuits: opportunities and challenges

Proceedings of the Conference on Design, Automation and Test in Europe
Rapid exploration of processing and design guidelines to overcome carbon nanotube variations

Proceedings of the 50th Annual Design Automation Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Transaction processing has emerged as the killer application for commercial servers. Most servers are engaged in transactional workloads such as processing search requests, serving middleware, evaluating decisions, managing databases, and powering online commerce. Currently, commercial servers are built from one or more high-performance superscalar processors. However, commercial server applications exhibit high cache miss rates, large memory footprints, and low instruction level parallelism (ILP), which leads to poor utilization on traditional ILP-focused superscalar processors [11]. In addition, these ILP-focused processors have been primarily optimized to deliver maximum performance by employing high clock rates and large amounts of speculation. As a result, we are now at the point where the performance/Watt of subsequent generations of traditional ILP-focused processors on server workloads has been flat [4] or even decreasing. The lack of increase in processor performance/Watt, coupled with the continued decrease in server hardware acquisition costs and likely increases in future power and cooling costs is leading to a situation where total cost of server ownership will soon be predominately determined by power [4]. In this paper, we argue that attacking thread-level parallelism (TLP) via a large number of simple cores on a chip multiprocessor (CMP) leads to much better performance/Watt for server workloads. As a case study, we compare Sun's TLP-oriented Niagara processor against the ILP-oriented dual-core Pentium Extreme Edition from Intel, showing that the Niagara processor has a significant performance/Watt advantage for throughput-oriented server applications.