Area and System Clock Effects on SMT/CMP Throughput
IEEE Transactions on Computers
Queueing Networks and Markov Chains
Queueing Networks and Markov Chains
High-Performance Throughput Computing
IEEE Micro
Efficiently exploring architectural design spaces via predictive modeling
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Thousand core chips: a technology perspective
Proceedings of the 44th annual Design Automation Conference
Illustrative Design Space Studies with Microarchitectural Regression Models
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Microarchitectural Design Space Exploration Using an Architecture-Centric Approach
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
A closed queuing network model with multiple servers for multi-threaded architecture
Computer Communications
Amdahl's Law in the Multicore Era
Computer
Revisiting the Cache Effect on Multicore Multithreaded Network Processors
DSD '08 Proceedings of the 2008 11th EUROMICRO Conference on Digital System Design Architectures, Methods and Tools
GrayWulf: Scalable Clustered Architecture for Data Intensive Computing
HICSS '09 Proceedings of the 42nd Hawaii International Conference on System Sciences
Reevaluating Amdahl's law in the multicore era
Journal of Parallel and Distributed Computing
Modeling critical sections in Amdahl's law and its implications for multicore design
Proceedings of the 37th annual international symposium on Computer architecture
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU
Proceedings of the 37th annual international symposium on Computer architecture
A correlation-based design space exploration methodology for multi-processor systems-on-chip
Proceedings of the 47th Design Automation Conference
Evaluating OpenMP on chip multithreading platforms
IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
FAWN: a fast array of wimpy nodes
Communications of the ACM
Analysis of multithreaded multiprocessors with distributed shared memory
SPDP '93 Proceedings of the 1993 5th IEEE Symposium on Parallel and Distributed Processing
Hi-index | 0.00 |
In this paper, we conduct a coarse-granular comparative analysis of wimpy (i.e., simple) fine-grain multicore processors against brawny (i.e., complex) simultaneous multithreaded (SMT) multicore processors for server applications with strong request-level parallelism. We explore a large design space along multiple dimensions, including the number of cores, the number of threads, and a wide range of workloads. For strong CPU-bound workload, a 2R-core wimpy-multicore processor is found to be on par with an R-core brawny-multicore processor in terms of throughput performance. For strong memory-bound workload, core-level multithreading is largely ineffective for both wimpy-multicore and brawny-multicore processors, except for the case of low core and thread counts per memory/disk interface. For both wimpy-multicore and brawny-multicore, there is an optimal core number at which the highest throughput performance is achieved, which reduces, as the workload becomes deeper memory-bound. Moreover, there is a threshold core number for a wimpy-multicore, beyond which it is outperformed by its brawny-multicore counterpart. These behaviors indicate that brawny-multicores are better choices than wimpy-multicores in terms of throughput performance.