SmartApps: An Application Centric Approach to High Performance Computing

Authors:
Lawrence Rauchwerger;Nancy M. Amato;Josep Torrellas
Affiliations:
-;-;-
Venue:
LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Year:
2000

Citing 21
Cited 1

Evaluating the performance of software cache coherence

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Analysis of critical architectural and programming parameters in a hierarchical

SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Comparison of hardware and software cache coherence schemes

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Modeling the performance of limited pointers directories for cache coherence

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
A scalable method for run-time loop parallelization

International Journal of Parallel Programming
VCODE: a retargetable, extensible, very fast dynamic code generation system

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
A quantitative comparison of parallel computation models

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Can shared-memory model serve as a bridging model for parallel computation?

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
HFS: a performance-oriented flexible file system based on building-block compositions

ACM Transactions on Computer Systems (TOCS)
Accounting for Memory Bank Contention and Delay in High-Bandwidth Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Tornado: maximizing locality and concurrency in a shared memory multiprocessor operating system

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization

IEEE Transactions on Parallel and Distributed Systems
An evaluation of staged run-time optimizations in DyC

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Comparing the memory system performance of the HP V-class and SGI Origin 2000 multiprocessors using microbenchmarks and scientific applications

ICS '99 Proceedings of the 13th international conference on Supercomputing
Adaptive reduction parallelization techniques

Proceedings of the 14th international conference on Supercomputing
Feedback Guided Dynamic Loop Scheduling: Algorithms and Experiments

Euro-Par '98 Proceedings of the 4th International Euro-Par Conference on Parallel Processing
Parallelizing while loops for multiprocessor systems

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Customization Lite

HOTOS '97 Proceedings of the 6th Workshop on Hot Topics in Operating Systems (HotOS-VI)
Hardware for Speculative Run-Time Parallelization in Distributed Shared-Memory Multiprocessors

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Improving Compiler and Run-Time Support for Adaptive Irregular Codes

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Predicting Performance on SMPs. A Case Study: The SGI Power Challenge

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing

SmartApps: An Application Centric Approach to High Performance Computing: Compiler-Assisted Software and Hardware Support for Reduction Operations

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

State-of-the-art run-time systems are a poor match to diverse, dynamic distributed applications because they are designed to provide support to a wide variety of applications, without much customization to individual specific requirements. Little or no guiding information flows directly from the application to the run-time system to allow the latter to fully tailor its services to the application. As a result, the performance is disappointing. To address this problem, we propose application-centric computing, or SMART APPLICATIONS. In the executable of smart applications, the compiler embeds most run-time system services, and a performance-optimizing feedback loop that monitors the application's performance and adaptively reconfigures the application and the OS/hardware platform. At run-time, after incorporating the code's input and the system's resources and state, the SmartApp performs a global optimization. This optimization is instance specific and thus much more tractable than a global generic optimization between application, OS and hardware. The resulting code and resource customization should lead to major speedups. In this paper, we first describe the overall architecture of Smartapps and then present the achievements to date: Run-time optimizations, performance modeling, and moderately reconfigurable hardware.