The Stanford Dash Multiprocessor
Computer
The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
A scalable method for run-time loop parallelization
International Journal of Parallel Programming
IEEE Transactions on Parallel and Distributed Systems
Adaptive reduction parallelization techniques
Proceedings of the 14th international conference on Supercomputing
Parallel Programming with Polaris
Computer
SmartApps: An Application Centric Approach to High Performance Computing
LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Architectural Support for Parallel Reductions in Scalable Shared-Memory Multiprocessors
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors
MASCOTS '94 Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems
Parallelizing while loops for multiprocessor systems
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Impulse: Building a Smarter Memory Controller
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
An Direct-Execution Framework for Fast and Accurate Simulation of Superscalar Processors
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Improving Compiler and Run-Time Support for Adaptive Irregular Codes
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Run-time parallelization: A framework for parallel computation
Run-time parallelization: A framework for parallel computation
The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops
IPDPS '02 Proceedings of the 16th International Symposium on Parallel and Distributed Processing
Speculative parallelization of partial reduction variables
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Hi-index | 0.00 |
State-of-the-art run-time systems are a poor match to diverse, dynamic distributed applications because they are designed to provide support to a wide variety of applications, without much customization to individual specific requirements. Little or no guiding information flows directly from the application to the run-time system to allow the latter to fully tailor its services to the application. As a result, the performance is disappointing. To address this problem, we propose application-centric computing, or SMART APPLICATIONS. In the executable of smart applications, the compiler embeds most run-time system services, and a performance-optimizing feedback loop that monitors the application's performance and adaptively reconfigures the application and the OS/hardware platform. At run-time, after incorporating the code's input and the system's resources and state, the SMARTAPP performs a global optimization. This optimization is instance specific and thus much more tractable than a global generic optimization between application, OS and hardware. The resulting code and resource customization should lead to major speedups. In this paper, we first describe the overall architecture of SMARTAPPS and then present some achievements to date, focusing on compiler-assisted software and hardware techniques for parallelizing reduction operations. These illustrate SMARTAPPS use of adaptive algorithm selection and moderately reconfigurable hardware.