Process cruise control: event-driven clock scaling for dynamic power management
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Maximizing CMP Throughput with Mediocre Cores
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Computer Architecture, Fourth Edition: A Quantitative Approach
Computer Architecture, Fourth Edition: A Quantitative Approach
Linux Device Drivers, 3rd Edition
Linux Device Drivers, 3rd Edition
Scheduling for reduced CPU energy
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Context switch overheads for Linux on ARM platforms
Proceedings of the 2007 workshop on Experimental computer science
Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Amdahl's Law in the Multicore Era
Computer
Prediction models for multi-dimensional power-performance optimization on many cores
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Multi-mode energy management for multi-tier server clusters
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
PowerNap: eliminating server idle power
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Workload Analysis and Demand Prediction of Enterprise Data Center Applications
IISWC '07 Proceedings of the 2007 IEEE 10th International Symposium on Workload Characterization
Optimal power allocation in server farms
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
ACM SIGOPS Operating Systems Review
FAWN: a fast array of wimpy nodes
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Low-power amdahl-balanced blades for data intensive computing
ACM SIGOPS Operating Systems Review
I/O scheduling model of virtual machine based on multi-core dynamic partitioning
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
SoftPower: fine-grain power estimations using performance counters
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Analytical Performance Modeling for Computer Systems
Analytical Performance Modeling for Computer Systems
Server workload analysis for power minimization using consolidation
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
An analysis of power consumption in a smartphone
USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
Dynamic voltage and frequency scaling: the laws of diminishing returns
HotPower'10 Proceedings of the 2010 international conference on Power aware computing and systems
Power management of online data-intensive services
Proceedings of the 38th annual international symposium on Computer architecture
Slow down or sleep, that is the question
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
A Practical Approach for Performance Analysis of Shared-Memory Programs
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Workload analysis of a large-scale key-value store
Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
Towards energy-proportional datacenter memory with mobile DRAM
Proceedings of the 39th Annual International Symposium on Computer Architecture
Proceedings of the 39th Annual International Symposium on Computer Architecture
ACM Transactions on Architecture and Code Optimization (TACO)
Power-aware predictive models of hybrid (MPI/OpenMP) scientific applications on multicore systems
Computer Science - Research and Development
Are sleep states effective in data centers?
IGCC '12 Proceedings of the 2012 International Green Computing Conference (IGCC)
Proceedings of the 5th ACM/SPEC international conference on Performance engineering
Hi-index | 0.00 |
There is growing interest to replace traditional servers with low-power multicore systems such as ARM Cortex-A9. However, such systems are typically provisioned for mobile applications that have lower memory and I/O requirements than server application. Thus, the impact and extent of the imbalance between application and system resources in exploiting energy efficient execution of server workloads is unclear. This paper proposes a trace-driven analytical model for understanding the energy performance of server workloads on ARM Cortex-A9 multicore systems. Key to our approach is the modeling of the degrees of CPU core, memory and I/O resource overlap, and in estimating the number of cores and clock frequency that optimizes energy performance without compromising execution time. Since energy usage is the product of utilized power and execution time, the model first estimates the execution time of a program. CPU time, which accounts for both cores and memory response time, is modeled as an M/G/1 queuing system. Workload characterization of high performance computing, web hosting and financial computing applications shows that bursty memory traffic fits a Pareto distribution, and non-bursty memory traffic is exponentially distributed. Our analysis using these server workloads reveals that not all server workloads might benefit from higher number of cores or clock frequencies. Applying our model, we predict the configurations that increase energy efficiency by 10% without turning off cores, and up to one third with shutting down unutilized cores. For memory-bounded programs, we show that the limited memory bandwidth might increase both execution time and energy usage, to the point where energy cost might be higher than on a typical x64 multicore system. Lastly, we show that increasing memory and I/O bandwidth can improve both the execution time and the energy usage of server workloads on ARM Cortex-A9 systems.