High-performance computer architecture
High-performance computer architecture
Stencils and problem partitionings: their influence on the performance of multiple processor systems
IEEE Transactions on Computers
On the problem of optimizing data transfers for complex memory systems
ICS '88 Proceedings of the 2nd international conference on Supercomputing
The effect of sharing on the cache and bus performance of parallel programs
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Image Flow Segmentation and Estimation by Constraint Line Clustering
IEEE Transactions on Pattern Analysis and Machine Intelligence
POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Minimum Distance: A Method for Partitioning Recurrences for Multiprocessors
IEEE Transactions on Computers
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
The influence of parallel decomposition strategies on the performance of multiprocessor systems
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Iteration Space Tiling for Memory Hierarchies
Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing
Multiprocessors: discussion of some theoretical and practical problems
Multiprocessors: discussion of some theoretical and practical problems
On program restructuring, scheduling, and communication for parallel processor systems
On program restructuring, scheduling, and communication for parallel processor systems
Data relocation and prefetching for programs with large data sets
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Optimal Data Scheduling for Uniform Multidimensional Applications
IEEE Transactions on Computers
Determining the idle time of a tiling
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
An Efficient Solution to the Cache Thrashing Problem Caused by True Data Sharing
IEEE Transactions on Computers
Optimal tiling for minimizing communication in distributed shared-memory multiprocessors
Compiler optimizations for scalable parallel systems
Hierarchical Compilation of Macro Dataflow Graphs for Multiprocessors with Local Memory
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Hi-index | 0.01 |
Adaptive data partitioning (ADP) which reduces the execution time of parallel programs by reducing interprocessor communication for iterative parallel loops is discussed. It is shown that ADP can be integrated into a communication-reducing back end for existing parallelizing compilers or as part of a machine-specific partitioner for parallel programs. A multiprocessor model to analyze program execution factors that lead to interprocessor communication and a model for the iterative parallel loop to quantify communication patterns within a program are defined. A vector notation is chosen to quantify communication across a global data set. Communication parameters are computed by examining the indexes of array accesses and are adjusted to reflect the underlying system architecture by compensating for cache line sizes. These values are used to generate rectangular and hexagonal partitions that reduce interprocessor communication.