Theoretical modeling of superscalar processor performance
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Digital Control of Dynamic Systems
Digital Control of Dynamic Systems
Floating Point Division and Square Root Algorithms and Implementation in the AMD-K7 Microprocessor
ARITH '99 Proceedings of the 14th IEEE Symposium on Computer Arithmetic
Built-In Current Sensor for IDDQ Testing in Deep Submicron CMOS
VTS '99 Proceedings of the 1999 17TH IEEE VLSI Test Symposium
Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Temperature-aware microarchitecture: Modeling and implementation
ACM Transactions on Architecture and Code Optimization (TACO)
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Exploiting Barriers to Optimize Power Consumption of CMPs
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
The Thrifty Barrier: Energy-Aware Synchronization in Shared-Memory Multiprocessors
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Virtual hierarchies to support server consolidation
Proceedings of the 34th annual international symposium on Computer architecture
Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite
Proceedings of the 34th annual international symposium on Computer architecture
ICAC '07 Proceedings of the Fourth International Conference on Autonomic Computing
Thousand core chips: a technology perspective
Proceedings of the 44th annual Design Automation Conference
No "power" struggles: coordinated multi-level power management for the data center
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Multi-optimization power management for chip multiprocessors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Meeting points: using thread criticality to adapt multicore hardware to parallel regions
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Proceedings of the 36th annual international symposium on Computer architecture
Thread motion: fine-grained power management for multi-core systems
Proceedings of the 36th annual international symposium on Computer architecture
Temperature-constrained power control for chip multiprocessors with online model estimation
Proceedings of the 36th annual international symposium on Computer architecture
Virtual machine power metering and provisioning
Proceedings of the 1st ACM symposium on Cloud computing
Coordinated power management of voltage islands in CMPs
Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Scalable thread scheduling and global power management for heterogeneous many-core architectures
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
ORION 2.0: a fast and accurate NoC power and area model for early-stage design space exploration
Proceedings of the Conference on Design, Automation and Test in Europe
Distributed peak power management for many-core architectures
Proceedings of the Conference on Design, Automation and Test in Europe
CPM in CMPs: Coordinated Power Management in Chip-Multiprocessors
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Coordinating processor and main memory for efficientserver power control
Proceedings of the international conference on Supercomputing
Leveraging stored energy for handling power emergencies in aggressively provisioned datacenters
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Thermal management of a many-core processor under fine-grained parallelism
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing
PGCapping: exploiting power gating for power capping and core lifetime balancing in CMPs
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
PEPON: performance-aware hierarchical power budgeting for NoC based multicores
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Aggressive Datacenter Power Provisioning with Batteries
ACM Transactions on Computer Systems (TOCS)
Self-adaptive hybrid dynamic power management for many-core systems
Proceedings of the Conference on Design, Automation and Test in Europe
Proceedings of the Conference on Design, Automation and Test in Europe
Hierarchical power management for asymmetric multi-core in dark silicon era
Proceedings of the 50th Annual Design Automation Conference
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Energy-efficient virtual machine scheduling in performance-asymmetric multi-core architectures
Proceedings of the 8th International Conference on Network and Service Management
Coordinated power-performance optimization in manycores
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Price theory based power management for heterogeneous multi-cores
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Agent-based distributed power management for kilo-core processors
Proceedings of the International Conference on Computer-Aided Design
Hi-index | 0.00 |
Optimizing the performance of a multi-core microprocessor within a power budget has recently received a lot of attention. However, most existing solutions are centralized and cannot scale well with the rapidly increasing level of core integration. While a few recent studies propose power control algorithms for many-core architectures, those solutions assume that the workload of every core is independent and therefore cannot effectively allocate power based on thread criticality to accelerate multi-threaded parallel applications, which are expected to be the primary workloads of many-core architectures. This paper presents a scalable power control solution for many-core microprocessors that is specifically designed to handle realistic workloads, i.e., a mixed group of single-threaded and multi-threaded applications. Our solution features a three-layer design. First, we adopt control theory to precisely control the power of the entire chip to its chip-level budget by adjusting the aggregated frequency of all the cores on the chip. Second, we dynamically group cores running the same applications and then partition the chip-level aggregated frequency quota among different groups for optimized overall microprocessor performance. Finally, we partition the group-level frequency quota among the cores in each group based on the measured thread criticality for shorter application completion time. As a result, our solution can optimize the microprocessor performance while precisely limiting the chip-level power consumption below the desired budget. Empirical results on a 12-core hardware testbed show that our control solution can provide precise power control, as well as 17% and 11% better application performance than two state-of-the-art solutions, on average, for mixed PARSEC and SPEC benchmarks. Furthermore, our extensive simulation results for 32, 64, and 128 cores, as well as overhead analysis for up to 4,096 cores, demonstrate that our solution is highly scalable to many-core architectures.