Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Evaluation of multithreaded uniprocessors for commercial application environments
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
ICS '90 Proceedings of the 4th international conference on Supercomputing
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading
ACM Transactions on Computer Systems (TOCS)
Memory system characterization of commercial workloads
Proceedings of the 25th annual international symposium on Computer architecture
Performance characterization of a Quad Pentium Pro SMP using OLTP workloads
Proceedings of the 25th annual international symposium on Computer architecture
An analysis of database workload performance on simultaneous multithreaded processors
Proceedings of the 25th annual international symposium on Computer architecture
DBMSs on a Modern Processor: Where Does Time Go?
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Initial Observations of the Simultaneous Multithreading Pentium 4 Processor
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
SimICS/sun4m: a virtual workstation
ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
A multithreaded PowerPC processor for commercial servers
IBM Journal of Research and Development
An overview of the Sam CMT simulator kit
An overview of the Sam CMT simulator kit
Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Balancing power consumption in multiprocessor systems
Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Scheduling threads for constructive cache sharing on CMPs
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Cooperative cache partitioning for chip multiprocessors
Proceedings of the 21st annual international conference on Supercomputing
Do commodity SMT processors need more OS research?
ACM SIGOPS Operating Systems Review
Provably good multicore cache performance for divide-and-conquer algorithms
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Processor hardware counter statistics as a first-class system resource
HOTOS'07 Proceedings of the 11th USENIX workshop on Hot topics in operating systems
Efficient operating system scheduling for performance-asymmetric multi-core architectures
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
PAM: a novel performance/power aware meta-scheduler for multi-core systems
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Exploration of the Influence of Program Inputs on CMP Co-scheduling
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Analysis and approximation of optimal co-scheduling on chip multiprocessors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
MCC-DB: minimizing cache conflicts in multi-core processors for databases
Proceedings of the VLDB Endowment
COMPASS: a programmable data prefetcher using idle GPU shaders
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Thread allocation in CMP-based multithreaded network processors
Parallel Computing
Contention aware execution: online contention detection and response
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
An approach to resource-aware co-scheduling for CMPs
Proceedings of the 24th ACM International Conference on Supercomputing
Dynamic workload characterization for power efficient scheduling on CMP systems
Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
Cache Performance Optimization for Processing XML-Based Application Data on Multi-core Processors
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Dynamically managed multithreaded reconfigurable architectures for chip multiprocessors
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Exploiting unbalanced thread scheduling for energy and performance on a CMP of SMT processors
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Compatible phase co-scheduling on a CMP of multi-threaded processors
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Proposal and evaluation of APIs for utilizing inter-core time aggregation scheduler
JSSPP'10 Proceedings of the 15th international conference on Job scheduling strategies for parallel processing
Fast modeling of shared caches in multicore systems
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Proceedings of the sixth conference on Computer systems
Proceedings of the international symposium on Memory management
Energy efficient scheduling for multithreaded programs on general-purpose processors
Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
Optimal task assignment in multithreaded processors: a statistical approach
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Region scheduling: efficiently using the cache architectures via page-level affinity
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Combining locality analysis with online proactive job co-scheduling in chip multiprocessors
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Performance and power aware CMP thread allocation modeling
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Phase guided profiling for fast cache modeling
Proceedings of the Tenth International Symposium on Code Generation and Optimization
A Machine Learning Based Meta-Scheduler for Multi-Core Processors
International Journal of Adaptive, Resilient and Autonomic Systems
Uncovering CPU load balancing policies with harmony
Proceedings of the ACM International Conference on Computing Frontiers
Hi-index | 0.01 |
We investigated how operating system design should be adapted for multithreaded chip multiprocessors (CMT) - a new generation of processors that exploit thread-level parallelism to mask the memory latency in modern workloads. We determined that the L2 cache is a critical shared resource on CMT and that an insufficient amount of L2 cache can undermine the ability to hide memory latency on these processors. To use the L2 cache as efficiently as possible, we propose an L2-conscious scheduling algorithm and quantify its performance potential. Using this algorithm it is possible to reduce miss ratios in the L2 cache by 25-37% and improve processor throughput by 27-45%.