Generating local addresses and communication sets for data-parallel programs
Journal of Parallel and Distributed Computing
Parallel image processing applications on a network of workstations
Parallel Computing
Global optimization for mapping parallel image processing tasks on distributed memory machines
Journal of Parallel and Distributed Computing
Logic simulation using networks of state machines
DATE '00 Proceedings of the conference on Design, automation and test in Europe
Data Locality Exploitation in the Decomposition of Regular Domain Problems
IEEE Transactions on Parallel and Distributed Systems
The distributed ASCI Supercomputer project
ACM SIGOPS Operating Systems Review
A Minimum Cost Approach for Segmenting Networks of Lines
International Journal of Computer Vision
IEEE Transactions on Parallel and Distributed Systems
Introduction To Automata Theory, Languages, And Computation
Introduction To Automata Theory, Languages, And Computation
A data and task parallel image processing environment
Parallel Computing - Parallel computing in image and video processing
A software architecture for user transparent parallel image processing
Parallel Computing - Parallel computing in image and video processing
Fast Automatic Generation of DSP Algorithms
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
EASY PIPE: An ``EASY to use'' Parallel Image processing Environment based on algorithmic skelekons
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Multi-scale Document Description Using Rectangular Granulometries
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
A PVM Implementation of a Portable Parallel Image Processing Library
EuroPVM '96 Proceedings of the Third European PVM Conference on Parallel Virtual Machine
Generalized Multipartitioning for Multi-Dimensional Arrays
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A high-level approach to synthesis of high-performance codes for quantum chemistry
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Learning to construct fast signal processing implementations
The Journal of Machine Learning Research
Concurrency and Computation: Practice & Experience
User Transparent Parallel Processing of the 2004 NIST TRECVID Data Set
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Adaptive Parallel Householder Bidiagonalization
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
User transparent task parallel multimedia content analysis
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Journal of Computer and Systems Sciences International
A method for minimizing Moore finite-state machines by merging two states
Journal of Computer and Systems Sciences International
Towards user transparent parallel multimedia computing on GPU-Clusters
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
A parallel solution for high resolution histological image analysis
Computer Methods and Programs in Biomedicine
Hi-index | 0.00 |
A popular approach to providing nonexperts in parallel computing with an easy-to-use programming model is to design a software library consisting of a set of preparallelized routines, and hide the intricacies of parallelization behind the library's API. However, for regular domain problems (such as simple matrix manipulations or low-level image processing applications驴in which all elements in a regular subset of a dense data field are accessed in turn) speedup obtained with many such library-based parallelization tools is often suboptimal. This is because interoperation optimization (or: time-optimization of communication steps across library calls) is generally not incorporated in the library implementations. This paper presents a simple, efficient, finite state machine-based approach for communication minimization of library-based data parallel regular domain problems. In the approach, referred to as lazy parallelization, a sequential program is parallelized automatically at runtime by inserting communication primitives and memory management operations whenever necessary. Apart from being simple and cheap, lazy parallelization guarantees to generate legal, correct, and efficient parallel programs at all times. The effectiveness of the approach is demonstrated by analyzing the performance characteristics of two typical regular domain problems obtained from the field of low-level image processing. Experimental results show significant performance improvements over nonoptimized parallel applications. Moreover, obtained communication behavior is found to be optimal with respect to the abstraction level of message passing programs.