Evaluating the performance of software cache coherence
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Analysis of critical architectural and programming parameters in a hierarchical
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Comparison of hardware and software cache coherence schemes
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Modeling the performance of limited pointers directories for cache coherence
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
A scalable method for run-time loop parallelization
International Journal of Parallel Programming
VCODE: a retargetable, extensible, very fast dynamic code generation system
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
A quantitative comparison of parallel computation models
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Can shared-memory model serve as a bridging model for parallel computation?
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
HFS: a performance-oriented flexible file system based on building-block compositions
ACM Transactions on Computer Systems (TOCS)
Accounting for Memory Bank Contention and Delay in High-Bandwidth Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Tornado: maximizing locality and concurrency in a shared memory multiprocessor operating system
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
IEEE Transactions on Parallel and Distributed Systems
An evaluation of staged run-time optimizations in DyC
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
ICS '99 Proceedings of the 13th international conference on Supercomputing
Adaptive reduction parallelization techniques
Proceedings of the 14th international conference on Supercomputing
Feedback Guided Dynamic Loop Scheduling: Algorithms and Experiments
Euro-Par '98 Proceedings of the 4th International Euro-Par Conference on Parallel Processing
Parallelizing while loops for multiprocessor systems
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
HOTOS '97 Proceedings of the 6th Workshop on Hot Topics in Operating Systems (HotOS-VI)
Hardware for Speculative Run-Time Parallelization in Distributed Shared-Memory Multiprocessors
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Improving Compiler and Run-Time Support for Adaptive Irregular Codes
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Predicting Performance on SMPs. A Case Study: The SGI Power Challenge
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Hi-index | 0.00 |
State-of-the-art run-time systems are a poor match to diverse, dynamic distributed applications because they are designed to provide support to a wide variety of applications, without much customization to individual specific requirements. Little or no guiding information flows directly from the application to the run-time system to allow the latter to fully tailor its services to the application. As a result, the performance is disappointing. To address this problem, we propose application-centric computing, or SMART APPLICATIONS. In the executable of smart applications, the compiler embeds most run-time system services, and a performance-optimizing feedback loop that monitors the application's performance and adaptively reconfigures the application and the OS/hardware platform. At run-time, after incorporating the code's input and the system's resources and state, the SmartApp performs a global optimization. This optimization is instance specific and thus much more tractable than a global generic optimization between application, OS and hardware. The resulting code and resource customization should lead to major speedups. In this paper, we first describe the overall architecture of Smartapps and then present the achievements to date: Run-time optimizations, performance modeling, and moderately reconfigurable hardware.