Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading
ACM Transactions on Computer Systems (TOCS)
Symbiotic jobscheduling for a simultaneous multithreaded processor
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Process cruise control: event-driven clock scaling for dynamic power management
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Runtime Empirical Selection of Loop Schedulers on Hyperthreaded SMPs
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Exploiting Barriers to Optimize Power Consumption of CMPs
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Adaptive execution techniques for SMT multiprocessor architectures
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Methods for Modeling Resource Contention on Simultaneous Multithreading Processors
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Exploiting unbalanced thread scheduling for energy and performance on a CMP of SMT processors
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A comparison of online and offline strategies for program adaptation
ACM-SE 45 Proceedings of the 45th annual southeast regional conference
A programming environment with runtime energy characterization for energy-aware applications
ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
Processor hardware counter statistics as a first-class system resource
HOTOS'07 Proceedings of the 11th USENIX workshop on Hot topics in operating systems
Prediction models for multi-dimensional power-performance optimization on many cores
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Adagio: making DVS practical for complex HPC applications
Proceedings of the 23rd international conference on Supercomputing
Adapting application execution in CMPs using helper threads
Journal of Parallel and Distributed Computing
Energy-Efficient Cluster Computing via Accurate Workload Characterization
CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications
Proceedings of the 37th annual international symposium on Computer architecture
Optimizing throughput and latency under given power budget for network packet processing
INFOCOM'10 Proceedings of the 29th conference on Information communications
A workload-aware mapping approach for data-parallel programs
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Parallelism orchestration using DoPE: the degree of parallelism executive
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Principles of energy efficiency in high performance computing
ICT-GLOW'11 Proceedings of the First international conference on Information and communication on technology for the fight against global warming
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Power-aware predictive models of hybrid (MPI/OpenMP) scientific applications on multicore systems
Computer Science - Research and Development
DNA-inspired scheme for building the energy profile of HPC systems
E2DC'12 Proceedings of the First international conference on Energy Efficient Data Centers
A self-adaptive heterogeneous multi-core architecture for embedded real-time video object tracking
Journal of Real-Time Image Processing
Predicting Performance Impact of DVFS for Realistic Memory Systems
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Exploring hardware overprovisioning in power-constrained, high performance computing
Proceedings of the 27th international ACM conference on International conference on supercomputing
Holistic run-time parallelism management for time and energy efficiency
Proceedings of the 27th international ACM conference on International conference on supercomputing
Adaptive parallelism for web search
Proceedings of the 8th ACM European Conference on Computer Systems
MuMMI: multiple metrics modeling infrastructure for exploring performance and power modeling
Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery
Hi-index | 0.00 |
With high-end systems featuring multicore/multithreaded processors and high component density, power-aware high-performance multithreading libraries become a critical element of the system software stack. Online power and performance adaptation of multithreaded code from within user-level runtime libraries is a relatively new and unexplored area of research. We present a user-level library framework for nearly optimal online adaptation of multithreaded codes for low-power, high-performance execution. Our framework operates by regulating concurrency and changing the processors/threads configuration as the program executes. It is innovative in that it uses fast, runtime performance prediction derived from hardware event-driven profiling, to select thread granularities that achieve nearly optimal energy-efficiency points. The use of predictors substantially reduces the runtime cost of granularity control and program adaptation. Our framework achieves performance and ED2 (energy-delay-squared) levels which are: i) comparable to or better than those of oracle-derived offline predictors; ii) significantly better than those of online predictors using exhaustive or localized linear search. The complete prediction and adaptation framework is implemented on a real multi-SMT system with Intel Hyperthreaded processors and embeds adaptation capabilities in OpenMP programs.