Wimpy or brawny cores: A throughput perspective

Authors:
Xiangyang Liang;Minh Nguyen;Hao Che
Affiliations:
-;-;-
Venue:
Journal of Parallel and Distributed Computing
Year:
2013

Citing 20
Cited 0

Web Search for a Planet: The Google Cluster Architecture

IEEE Micro
Area and System Clock Effects on SMT/CMP Throughput

IEEE Transactions on Computers
Queueing Networks and Markov Chains

Queueing Networks and Markov Chains
High-Performance Throughput Computing

IEEE Micro
Efficiently exploring architectural design spaces via predictive modeling

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Thousand core chips: a technology perspective

Proceedings of the 44th annual Design Automation Conference
Illustrative Design Space Studies with Microarchitectural Regression Models

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Microarchitectural Design Space Exploration Using an Architecture-Centric Approach

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
A closed queuing network model with multiple servers for multi-threaded architecture

Computer Communications
Amdahl's Law in the Multicore Era

Computer
Revisiting the Cache Effect on Multicore Multithreaded Network Processors

DSD '08 Proceedings of the 2008 11th EUROMICRO Conference on Digital System Design Architectures, Methods and Tools
GrayWulf: Scalable Clustered Architecture for Data Intensive Computing

HICSS '09 Proceedings of the 42nd Hawaii International Conference on System Sciences
Reevaluating Amdahl's law in the multicore era

Journal of Parallel and Distributed Computing
Parallelism via Multithreaded and Multicore CPUs

Computer
Modeling critical sections in Amdahl's law and its implications for multicore design

Proceedings of the 37th annual international symposium on Computer architecture
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU

Proceedings of the 37th annual international symposium on Computer architecture
A correlation-based design space exploration methodology for multi-processor systems-on-chip

Proceedings of the 47th Design Automation Conference
Evaluating OpenMP on chip multithreading platforms

IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
FAWN: a fast array of wimpy nodes

Communications of the ACM
Analysis of multithreaded multiprocessors with distributed shared memory

SPDP '93 Proceedings of the 1993 5th IEEE Symposium on Parallel and Distributed Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we conduct a coarse-granular comparative analysis of wimpy (i.e., simple) fine-grain multicore processors against brawny (i.e., complex) simultaneous multithreaded (SMT) multicore processors for server applications with strong request-level parallelism. We explore a large design space along multiple dimensions, including the number of cores, the number of threads, and a wide range of workloads. For strong CPU-bound workload, a 2R-core wimpy-multicore processor is found to be on par with an R-core brawny-multicore processor in terms of throughput performance. For strong memory-bound workload, core-level multithreading is largely ineffective for both wimpy-multicore and brawny-multicore processors, except for the case of low core and thread counts per memory/disk interface. For both wimpy-multicore and brawny-multicore, there is an optimal core number at which the highest throughput performance is achieved, which reduces, as the workload becomes deeper memory-bound. Moreover, there is a threshold core number for a wimpy-multicore, beyond which it is outperformed by its brawny-multicore counterpart. These behaviors indicate that brawny-multicores are better choices than wimpy-multicores in terms of throughput performance.