An analytical approach to performance/cost modeling of parallel computers
Journal of Parallel and Distributed Computing
The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The Alpha 21264 Microprocessor
IEEE Micro
IEEE Transactions on Parallel and Distributed Systems
A Static Scheduling Heuristic for Heterogeneous Processors
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
Proceedings of the 30th annual international symposium on Computer architecture
Computer Architecture: A Quantitative Approach
Computer Architecture: A Quantitative Approach
Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance
Proceedings of the 31st annual international symposium on Computer architecture
Conjoined-Core Chip Multiprocessing
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
VLSID '05 Proceedings of the 18th International Conference on VLSI Design held jointly with 4th International Conference on Embedded Systems Design
POWER4 system microarchitecture
IBM Journal of Research and Development
Converting massive TLP to DLP: a special-purpose processor for molecular orbital computations
Proceedings of the 4th international conference on Computing frontiers
Predictive thread-to-core assignment on a heterogeneous multi-core processor
Proceedings of the 4th workshop on Programming languages and operating systems
Performance Implications of Cache Affinity on Multicore Processors
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Adapting Application Mapping to Systematic Within-Die Process Variations on Chip Multiprocessors
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Novel task migration framework on configurable heterogeneous MPSoC platforms
Proceedings of the 2009 Asia and South Pacific Design Automation Conference
HASS: a scheduler for heterogeneous multicore systems
ACM SIGOPS Operating Systems Review
IWMSE '09 Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering
Maximizing power efficiency with asymmetric multicore systems
Communications of the ACM - Finding the Fun in Computer Science Education
ACM SIGOPS Operating Systems Review
Mapping stream programs onto heterogeneous multiprocessor systems
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Efficient program scheduling for heterogeneous multi-core processors
Proceedings of the 46th Annual Design Automation Conference
Age based scheduling for asymmetric multiprocessors
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
AASH: an asymmetry-aware scheduler for hypervisors
Proceedings of the 6th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Bias scheduling in heterogeneous multi-core architectures
Proceedings of the 5th European conference on Computer systems
A comprehensive scheduler for asymmetric multicore systems
Proceedings of the 5th European conference on Computer systems
Proceedings of the 7th ACM international conference on Computing frontiers
Scalable thread scheduling and global power management for heterogeneous many-core architectures
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
AKULA: a toolset for experimenting and developing thread placement algorithms on multicore systems
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Journal of Parallel and Distributed Computing
Cost-aware function migration in heterogeneous systems
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Efficient interaction between OS and architecture in heterogeneous platforms
ACM SIGOPS Operating Systems Review
EURASIP Journal on Embedded Systems
HeteroScouts: hardware assist for OS scheduling in heterogeneous CMPs
Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
HeteroScouts: hardware assist for OS scheduling in heterogeneous CMPs
ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
Efficiently exploiting memory level parallelism on asymmetric coupled cores in the dark silicon era
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Thread Tranquilizer: Dynamically reducing performance variation
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Thread scheduling for heterogeneous multicore processors using phase identification
ACM SIGMETRICS Performance Evaluation Review
ACM Transactions on Computer Systems (TOCS)
Buffer sizing for self-timed stream programs on heterogeneous distributed memory multiprocessors
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Phase-based tuning for better utilization of performance-asymmetric multicore processors
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
A virtual memory based runtime to support multi-tenancy in clusters with GPUs
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Scheduling Concurrent Applications on a Cluster of CPU-GPU Nodes
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Energy-efficient scheduling on heterogeneous multi-core architectures
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Asymmetric Cache Coherency: Policy Modifications to Improve Multicore Performance
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Lucky scheduling for energy-efficient heterogeneous multi-core systems
HotPower'12 Proceedings of the 2012 USENIX conference on Power-Aware Computing and Systems
ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special section on adaptive power management for energy and temperature-aware computing systems
Autonomic load balancing mechanisms in the P2P desktop grid
Proceedings of the 2013 ACM Cloud and Autonomic Computing Conference
Semi-automatic restructuring of offloadable tasks for many-core accelerators
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
An opportunistic prediction-based thread scheduling to maximize throughput/watt in AMPs
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
When slower is faster: on heterogeneous multicores for reliable systems
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Scheduling concurrent applications on a cluster of CPU-GPU nodes
Future Generation Computer Systems
Trace based phase prediction for tightly-coupled heterogeneous cores
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
ACM Transactions on Architecture and Code Optimization (TACO)
Power-performance modeling on asymmetric multi-cores
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Energy-aware thread co-location in heterogeneous multicore processors
Proceedings of the Eleventh ACM International Conference on Embedded Software
Hi-index | 0.00 |
In a multi-programmed computing environment, threads of execution exhibit different runtime characteristics and hardware resource requirements. Not only do the behaviors of distinct threads differ, but each thread may also present diversity in its performance and resource usage over time. A heterogeneous chip multiprocessor (CMP) architecture consists of processor cores and caches of varying size and complexity. Prior work has shown that heterogeneous CMPs can meet the needs of a multi-programmed computing environment better than a homogeneous CMP system. In fact, the use of a combination of cores with different caches and instruction issue widths better accommodates threads with different computational requirements.A central issue in the design and use of heterogeneous systems is to determine an assignment of tasks to processors which better exploits the hardware resources in order to improve performance. In this paper we argue that the benefits of heterogeneous CMPs are bolstered by the usage of a dynamic assignment policy, i.e., a runtime mechanism which observes the behavior of the running threads and exploits thread migration between the cores. We validate our analysis by means of simulation. Specifically, our model assumes a combination of Alpha EV5 and Alpha EV6 processors and of integer and floating point programs from the SPEC2000 benchmark suite. We show that a dynamic assignment can outperform a static one by 20% to 40% on average and by as much as 80% in extreme cases, depending on the degree of multithreading simulated.