SIGMOD '86 Proceedings of the 1986 ACM SIGMOD international conference on Management of data
Cost-performance analysis of heterogeneity in supercomputer architectures
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Pipeline gating: speculation control for energy reduction
Proceedings of the 25th annual international symposium on Computer architecture
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
Energy efficient CMOS microprocessor design
HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
Dynamic Thermal Management for High-Performance Microprocessors
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance
Proceedings of the 31st annual international symposium on Computer architecture
Best of Both Latency and Throughput
ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Power-Performance Implications of Thread-level Parallelism on Chip Multiprocessors
ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
A performance-conserving approach for reducing peak power consumption in server systems
Proceedings of the 19th annual international conference on Supercomputing
Heterogeneous Chip Multiprocessors
Computer
Power-performance considerations of parallel computing on chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
Core architecture optimization for heterogeneous chip multiprocessors
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Power efficiency for variation-tolerant multicore processors
Proceedings of the 2006 international symposium on Low power electronics and design
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Conserving network processor power consumption by exploiting traffic variability
ACM Transactions on Architecture and Code Optimization (TACO)
Limiting the power consumption of main memory
Proceedings of the 34th annual international symposium on Computer architecture
Addressing instruction fetch bottlenecks by using an instruction register file
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Microprocessors in the era of terascale integration
Proceedings of the conference on Design, automation and test in Europe
Evaluating design tradeoffs in on-chip power management for CMPs
ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
Accurate branch prediction for short threads
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Merge: a programming model for heterogeneous multi-core systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Dependability, power, and performance trade-off on a multicore processor
Proceedings of the 2008 Asia and South Pacific Design Automation Conference
Efficient operating system scheduling for performance-asymmetric multi-core architectures
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Pangaea: a tightly-coupled IA32 heterogeneous chip multiprocessor
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Multitasking workload scheduling on flexible-core chip multiprocessors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Dynamic power management framework for multi-core portable embedded system
IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
Power-Aware Real-Time Scheduling upon Dual CPU Type Multiprocessor Platforms
OPODIS '08 Proceedings of the 12th International Conference on Principles of Distributed Systems
Statistical profiling-based techniques for effective power provisioning in data centers
Proceedings of the 4th ACM European conference on Computer systems
IWMSE '09 Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering
Maximizing power efficiency with asymmetric multicore systems
Communications of the ACM - Finding the Fun in Computer Science Education
Age based scheduling for asymmetric multiprocessors
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Multiple clock and voltage domains for chip multi processors
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
AASH: an asymmetry-aware scheduler for hypervisors
Proceedings of the 6th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Bias scheduling in heterogeneous multi-core architectures
Proceedings of the 5th European conference on Computer systems
A comprehensive scheduler for asymmetric multicore systems
Proceedings of the 5th European conference on Computer systems
Proceedings of the 7th ACM international conference on Computing frontiers
Proceedings of the 7th ACM international conference on Computing frontiers
Area-efficient floorplans and interconnects for homogeneous multi-core architectures
International Journal of High Performance Systems Architecture
Modeling critical sections in Amdahl's law and its implications for multicore design
Proceedings of the 37th annual international symposium on Computer architecture
Data marshaling for multi-core architectures
Proceedings of the 37th annual international symposium on Computer architecture
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Journal of Parallel and Distributed Computing
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Analyzing performance asymmetric multicore processors for latency sensitive datacenter applications
HotPower'10 Proceedings of the 2010 international conference on Power aware computing and systems
Bridging functional heterogeneity in multicore architectures
ACM SIGOPS Operating Systems Review
Efficient interaction between OS and architecture in heterogeneous platforms
ACM SIGOPS Operating Systems Review
Design and management of 3D-stacked NUCA cache for chip multiprocessors
Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSI
Benefits and limitations of tapping into stored energy for datacenters
Proceedings of the 38th annual international symposium on Computer architecture
Chameleon: operating system support for dynamic processors
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Bottleneck identification and scheduling in multithreaded applications
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
ACM Transactions on Computer Systems (TOCS)
KnightShift: shifting the I/O burden in datacenters to management processor for energy efficiency
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Dynamic binary rewriting and migration for shared-ISA asymmetric, multicore processors: summary
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Asymmetric Cache Coherency: Policy Modifications to Improve Multicore Performance
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
When less is more (LIMO):controlled parallelism forimproved efficiency
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special section on adaptive power management for energy and temperature-aware computing systems
Understanding fundamental design choices in single-ISA heterogeneous multicore architectures
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
DNA-inspired scheme for building the energy profile of HPC systems
E2DC'12 Proceedings of the First international conference on Energy Efficient Data Centers
KnightShift: Scaling the Energy Proportionality Wall through Server-Level Heterogeneity
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Composite Cores: Pushing Heterogeneity Into a Core
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Importance of single-core performance in the multicore era
ACSC '12 Proceedings of the Thirty-fifth Australasian Computer Science Conference - Volume 122
Agile, efficient virtualization power management with low-latency server power states
Proceedings of the 40th Annual International Symposium on Computer Architecture
Utility-based acceleration of multithreaded applications on asymmetric CMPs
Proceedings of the 40th Annual International Symposium on Computer Architecture
Criticality stacks: identifying critical threads in parallel programs using synchronization behavior
Proceedings of the 40th Annual International Symposium on Computer Architecture
Energy saving strategies for parallel applications with point-to-point communication phases
Journal of Parallel and Distributed Computing
Design and performance evaluation of NUMA-aware RDMA-based end-to-end data transfer systems
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Fairness-aware scheduling on single-ISA heterogeneous multi-cores
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Asymmetric scaling on network packet processors in the dark silicon era
ANCS '13 Proceedings of the ninth ACM/IEEE symposium on Architectures for networking and communications systems
The benefit of SMT in the multi-core era: flexibility towards degrees of thread-level parallelism
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors
Proceedings of Workshop on General Purpose Processing Using GPUs
Hi-index | 0.00 |
This paper is motivated by three recent trends in computer design. First, chip multi-processors (CMPs) with increasing numbers of CPU cores per chip are becoming common. Second, multi-threaded software that can take advantage of CMPs will soon become prevalent. Due to the nature of the algorithms, these multi-threaded programs inherently will have phases of sequential execution; Amdahlýs law dictates that the speedup of such parallel programs will be limited by the sequential portion of the computation. Finally, increasing levels of on-chip integration coupled with a slowing rate of reduction in supply voltage make power consumption a first order design constraint. Given this environment, our goal is to minimize the execution times of multi-threaded programs containing nontrivial parallel and sequential phases, while keeping the CMPýs total power consumption within a fixed budget. In order to mitigate the effects of Amdahlýs law, in this paper we make a compelling case for varying the amount of energy expended to process instructions according to the amount of available parallelism. Using the equation, Power=Energy per instruction (EPI) * Instructions per second (IPS), we propose that during phases of limited parallelism (low IPS) the chip multi-processor will spend more EPI; similarly, during phases of higher parallelism (high IPS) the chip multi-processor will spend less EPI; in both scenarios power is fixed. We evaluate the performance benefits of an EPI throttle on an asymmetric multiprocessor (AMP) prototyped from a physical 4-way Xeon SMP server. Using a wide range of multi-threaded programs, we show a 38% wall clock speedup on an AMP compared to a standard SMP that uses the same power. We also measure the supply current on a 4-way SMP server while running the multi-threaded programs and use the measured data as input to a software simulator that implements a more flexible EPI throttle. The results from the measurement-driven simulation show performance benefits comparable to the AMP prototype. We analyze the results from both techniques, explain why and when an EPI throttle works well, and conclude with a discussion of the challenges in building practical EPI throttles.