C++ gems
Advanced compiler design and implementation
Advanced compiler design and implementation
The C++ Programming Language, Third Edition
The C++ Programming Language, Third Edition
Overture: An Object-Oriented Framework for Solving Partial Differential Equations
ISCOPE '97 Proceedings of the Scientific Computing in Object-Oriented Parallel Environments
Just When You Thought Your Little Language Was Safe: ``Expression Templates'' in Java
GCSE '00 Proceedings of the Second International Symposium on Generative and Component-Based Software Engineering-Revised Papers
Cache-Efficient Multigrid Algorithms
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Treating a User-Defined Parallel Library as a Domain-Specific Language
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Performance Optimization of 3D Multigrid on Hierarchical Memory Architectures
PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
Smashing: Folding Space to Tile through Time
Languages and Compilers for Parallel Computing
Combining performance aspects of irregular gauss-seidel via sparse tiling
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Hi-index | 0.00 |
High-performance scientific computing relies increasingly on high-level, large-scale, object-oriented software frameworks to manage both algorithmic complexity and the complexities of parallelism: distributed data management, process management, inter-process communication, and load balancing. This encapsulation of data management, together with the prescribed semantics of a typical fundamental component of such object-oriented frameworks--a parallel or serial array class library--provides an opportunity for increasingly sophisticated compile-time optimization techniques. This paper describes two optimizing transformations suitable for certain classes of numerical algorithms, one for reducing the cost of inter-processor communication, and one for improving cache utilization; demonstrates and analyzes the resulting performance gains; and indicates how these transformations are being automated.