Direct parallelization of call statements
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
LAPACK Users' guide (third ed.)
LAPACK Users' guide (third ed.)
Efficient and precise array access analysis
ACM Transactions on Programming Languages and Systems (TOPLAS)
Hybrid analysis: static & dynamic memory reference analysis
ICS '02 Proceedings of the 16th international conference on Supercomputing
An Implementation of Interprocedural Bounded Regular Section Analysis
IEEE Transactions on Parallel and Distributed Systems
Interprocedural dependence analysis and parallelization
ACM SIGPLAN Notices - Best of PLDI 1979-1999
X10: an object-oriented approach to non-uniform cluster computing
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Parallel Programmability and the Chapel Language
International Journal of High Performance Computing Applications
Improving the accuracy of snoop filtering using stream registers
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
CellSs: making it easier to program the cell broadband engine processor
IBM Journal of Research and Development
Legion: expressing locality and independence with logical regions
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
OpenStream: Expressiveness and data-flow compilation of OpenMP streaming programs
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Transactional access to shared memory in starss, a task based programming model
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Implementing OmpSs support for regions of data in architectures with multiple address spaces
Proceedings of the 27th international ACM conference on International conference on supercomputing
Prefetching and cache management using task lifetimes
Proceedings of the 27th international ACM conference on International conference on supercomputing
Analysis of dependence tracking algorithms for task dataflow execution
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
The emergence of multicore processors has increased the need for simple parallel programming models usable by nonexperts. The ability to specify subparts of a bigger data structure is an important trait of High Productivity Programming Languages. Such a concept can also be applied to dependency-aware task-parallel programming models. In those paradigms, tasks may have data dependencies, and those are used for scheduling them in parallel. However, calculating dependencies between subparts of bigger data structures is challenging. Accessed data may be strided, and can fully or partially overlap the accesses of other tasks. Techniques that are too approximate may produce too many extra dependencies and limit parallelism. Techniques that are too precise may be impractical in terms of time and space. We present the abstractions, data structures and algorithms to calculate dependencies between tasks with strided and possibly different memory access patterns. Our technique is performed at run time from a description of the inputs and outputs of each task and is not affected by pointer arithmetic nor reshaping. We demonstrate how it can be applied to increase programming productivity. We also demonstrate that scalability is comparable to other solutions and in some cases higher due to better parallelism extraction.