MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Understanding some simple processor-performance limits
IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Symbiotic jobscheduling for a simultaneous multithreaded processor
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance
Proceedings of the 31st annual international symposium on Computer architecture
Microarchitecture Optimizations for Exploiting Memory-Level Parallelism
Proceedings of the 31st annual international symposium on Computer architecture
Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Scheduling for heterogeneous processors in server systems
Proceedings of the 2nd conference on Computing frontiers
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
Efficient operating system scheduling for performance-asymmetric multi-core architectures
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
HASS: a scheduler for heterogeneous multicore systems
ACM SIGOPS Operating Systems Review
A mechanistic performance model for superscalar out-of-order processors
ACM Transactions on Computer Systems (TOCS)
Efficient program scheduling for heterogeneous multi-core processors
Proceedings of the 46th Annual Design Automation Conference
Age based scheduling for asymmetric multiprocessors
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Bias scheduling in heterogeneous multi-core architectures
Proceedings of the 5th European conference on Computer systems
High performance cache replacement using re-reference interval prediction (RRIP)
Proceedings of the 37th annual international symposium on Computer architecture
Scalable thread scheduling and global power management for heterogeneous many-core architectures
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Efficient interaction between OS and architecture in heterogeneous platforms
ACM SIGOPS Operating Systems Review
Efficiently exploiting memory level parallelism on asymmetric coupled cores in the dark silicon era
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
CRUISE: cache replacement and utility-aware scheduling
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Phase-based scheduling and thread migration for heterogeneous multicore processors
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Understanding fundamental design choices in single-ISA heterogeneous multicore architectures
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Composite Cores: Pushing Heterogeneity Into a Core
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Utility-based acceleration of multithreaded applications on asymmetric CMPs
Proceedings of the 40th Annual International Symposium on Computer Architecture
Hierarchical power management for asymmetric multi-core in dark silicon era
Proceedings of the 50th Annual Design Automation Conference
An opportunistic prediction-based thread scheduling to maximize throughput/watt in AMPs
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Fairness-aware scheduling on single-ISA heterogeneous multi-cores
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
When slower is faster: on heterogeneous multicores for reliable systems
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Trace based phase prediction for tightly-coupled heterogeneous cores
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Price theory based power management for heterogeneous multi-cores
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Power-performance modeling on asymmetric multi-cores
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Energy-aware thread co-location in heterogeneous multicore processors
Proceedings of the Eleventh ACM International Conference on Embedded Software
QoS-Aware scheduling in heterogeneous datacenters with paragon
ACM Transactions on Computer Systems (TOCS)
The Journal of Supercomputing
Trace alignment algorithms for offline workload analysis of heterogeneous architectures
Proceedings of the International Conference on Computer-Aided Design
ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors
Proceedings of Workshop on General Purpose Processing Using GPUs
Adaptive workload-aware task scheduling for single-ISA asymmetric multicore architectures
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Single-ISA heterogeneous multi-core processors are typically composed of small (e.g., in-order) power-efficient cores and big (e.g., out-of-order) high-performance cores. The effectiveness of heterogeneous multi-cores depends on how well a scheduler can map workloads onto the most appropriate core type. In general, small cores can achieve good performance if the workload inherently has high levels of ILP. On the other hand, big cores provide good performance if the workload exhibits high levels of MLP or requires the ILP to be extracted dynamically. This paper proposes Performance Impact Estimation (PIE) as a mechanism to predict which workload-to-core mapping is likely to provide the best performance. PIE collects CPI stack, MLP and ILP profile information, and estimates performance if the workload were to run on a different core type. Dynamic PIE adjusts the scheduling at runtime and thereby exploits fine-grained time-varying execution behavior. We show that PIE requires limited hardware support and can improve system performance by an average of 5.5% over recent state-of-the-art scheduling proposals and by 8.7% over a sampling-based scheduling policy.