Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
CHARM++: a portable concurrent object oriented system based on C++
OOPSLA '93 Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications
Parallel programming in Split-C
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
High-level optimization via automated statistical modeling
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Object-oriented runtime support for complex distributed data structures
Object-oriented runtime support for complex distributed data structures
A scalable method for run-time loop parallelization
International Journal of Parallel Programming
The Nexus approach to integrating multithreading and communication
Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
HPC++: experiments with the parallel standard template library
ICS '97 Proceedings of the 11th international conference on Supercomputing
The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Poems: end-to-end performance design of large parallel adaptive computational systems
Proceedings of the 1st international workshop on Software and performance
IEEE Transactions on Parallel and Distributed Systems
Adaptive reduction parallelization techniques
Proceedings of the 14th international conference on Supercomputing
Basic Linear Algebra Subprograms for Fortran Usage
ACM Transactions on Mathematical Software (TOMS)
Requirements for and evaluation of RMI protocols for scientific computing
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Parallel Programming Using C++
Parallel Programming Using C++
Tulip: A Portable Run-Time System for Object-Parallel Systems
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Parallelizing while loops for multiprocessor systems
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
NESL: A Nested Data-Parallel Language (Version 2.6)
NESL: A Nested Data-Parallel Language (Version 2.6)
Support for parallel generic programming
Support for parallel generic programming
An adaptive framework for 'single shot' motion planning
An adaptive framework for 'single shot' motion planning
A Dynamically Tuned Sorting Library
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Hybrid analysis: static & dynamic memory reference analysis
International Journal of Parallel Programming
An Adaptive Algorithm Selection Framework
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
A framework for adaptive algorithm selection in STAPL
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Statistical Models for Empirical Search-Based Performance Tuning
International Journal of High Performance Computing Applications
Multiple Page Size Modeling and Optimization
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Hi-index | 0.00 |
One general avenue to obtain optimized performance on large and complex systems is to approach optimization from a global perspective of the complete system in a customized manner for each application, i.e., application-centric optimization. Lately, there have been encouraging developments in reconfigurable operating systems and hardware that will enable customized optimization. For example, machines built with PIM's and FPGA's can be quickly reconfigured to better fit a certain application and operating systems, such as IBM's K42, can have their services customized to fit the needs and characteristics of an application. While progress in operating system and hardware and hardware has made re-configuration possible, we still need strategies and techniques to exploit them for improved application performance.In this paper, we describe the approach we are using in our smart application (SMARTAPPS) project. In the SMARTAPP executable, the compiler embeds most run-time system services and a feedback loop to monitor performance and trigger run-time adaptations. At run-time, after incorporating the code's input and determining the system's state, the SMARTAPP performs an instance specific optimization. During execution, the application continually monitors its performance and the available resources to determine if restructuring should occur. The framework includes mechanisms for performing the actual restructuring at various levels including: algorithmic adaptation, tuning reconfigurable OS services (scheduling policy, page size, etc.), and system configuration (e.g., number of processors). This paper concentrates on the techniques for providing customized system services for communication, thread scheduling, memory management, and performance monitoring and modeling.