Interleaving: a multithreading technique targeting multiprocessors and workstations
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Memory system characterization of commercial workloads
Proceedings of the 25th annual international symposium on Computer architecture
An analysis of database workload performance on simultaneous multithreaded processors
Proceedings of the 25th annual international symposium on Computer architecture
Performance of database workloads on shared-memory systems with out-of-order processors
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
Exploring the Design Space of Future CMPs
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Variability in Architectural Simulations of Multi-Threaded Workloads
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Proceedings of the 30th annual international symposium on Computer architecture
Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance
Proceedings of the 31st annual international symposium on Computer architecture
Adaptive Cache Compression for High-Performance Processors
Proceedings of the 31st annual international symposium on Computer architecture
Chip Multithreading: Opportunities and Challenges
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
ACM SIGMETRICS Performance Evaluation Review - Special issue on tools for computer architecture research
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
A performance methodology for commercial servers
IBM Journal of Research and Development
Performance/Watt: the new server focus
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
The RASE (Rapid, Accurate Simulation Environment) for chip multiprocessors
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
A chip prototyping substrate: the flexible architecture for simulation and testing (FAST)
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Reducing power through compiler-directed barrier synchronization elimination
Proceedings of the 2006 international symposium on Low power electronics and design
Efficiently exploring architectural design spaces via predictive modeling
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Fairness and Throughput in Switch on Event Multithreading
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Leveraging Optical Technology in Future Bus-based Chip Multiprocessors
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Performance/area efficiency in chip multiprocessors with micro-caches
Proceedings of the 4th international conference on Computing frontiers
Scheduling threads for constructive cache sharing on CMPs
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Comparing memory systems for chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
Fairness enforcement in switch on event multithreading
ACM Transactions on Architecture and Code Optimization (TACO)
Evaluating design tradeoffs in on-chip power management for CMPs
ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
Efficient architectural design space exploration via predictive modeling
ACM Transactions on Architecture and Code Optimization (TACO)
Hiding the misprediction penalty of a resource-efficient high-performance processor
ACM Transactions on Architecture and Code Optimization (TACO)
The coming wave of multithreaded chip multiprocessors
International Journal of Parallel Programming
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Federation: repurposing scalar cores for out-of-order instruction issue
Proceedings of the 45th annual Design Automation Conference
Predictive design space exploration using genetically programmed response surfaces
Proceedings of the 45th annual Design Automation Conference
Distributed cooperative caching
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Comparative evaluation of memory models for chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
Dynamic power management framework for multi-core portable embedded system
IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
Shore-MT: a scalable storage manager for the multicore era
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Reactive NUCA: near-optimal block placement and replication in distributed caches
Proceedings of the 36th annual international symposium on Computer architecture
Exploration of 3D stacked L2 cache design for high performance and efficient thermal control
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Global management of cache hierarchies
Proceedings of the 7th ACM international conference on Computing frontiers
Architecture performance prediction using evolutionary artificial neural networks
Evo'08 Proceedings of the 2008 conference on Applications of evolutionary computing
Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Web search using mobile cores: quantifying and mitigating the price of efficiency
Proceedings of the 37th annual international symposium on Computer architecture
Proceedings of the 37th annual international symposium on Computer architecture
Understanding throughput-oriented architectures
Communications of the ACM
Federation: Boosting per-thread performance of throughput-oriented manycore architectures
ACM Transactions on Architecture and Code Optimization (TACO)
CPM in CMPs: Coordinated Power Management in Chip-Multiprocessors
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs?
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
A NoC-based hybrid message-passing/shared-memory approach to CMP design
Microprocessors & Microsystems
A minimalist cache coherent MPSoC designed for FPGAs
International Journal of High Performance Systems Architecture
Mobile processors for energy-efficient web search
ACM Transactions on Computer Systems (TOCS)
Clearing the clouds: a study of emerging scale-out workloads on modern hardware
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Performance/Thermal-Aware Design of 3D-Stacked L2 Caches for CMPs
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Proceedings of the 39th Annual International Symposium on Computer Architecture
PEPON: performance-aware hierarchical power budgeting for NoC based multicores
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors
ACM Transactions on Computer Systems (TOCS)
ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
Efficient Reuse Distance Analysis of Multicore Scaling for Loop-Based Parallel Programs
ACM Transactions on Computer Systems (TOCS)
On understanding the energy consumption of ARM-based multicore servers
Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems
Market mechanisms for managing datacenters with heterogeneous microarchitectures
ACM Transactions on Computer Systems (TOCS)
Eliminating unscalable communication in transaction processing
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.02 |
In this paper we compare the performance of area equivalent small, medium, and large-scale multithreaded chip multiprocessors (CMTs) using throughput-oriented applications. We use area models based on SPARC processors incorporating these architectural features. We examine CMTs with inorder scalar processor cores, 2-way or 4-way in-order superscalar cores, private primary instruction and data caches, and a shared secondary cache. We explore a large design space, ranging from processor-intensive to cache-intensive CMTs. We use SPEC JBB2000, TPCC, TPC-W, and XML Test to demonstrate that the scalar simple-core CMTs do a better job of addressing the problems of low instruction-level parallelism and high cache miss rates that dominate web-service middleware and online transaction processing applications. For the best overall CMT performance, smaller cores with lower performance, so called "mediocre" cores, maximize the total number of CMT cores and outperform CMTs built from larger, higher performance cores.