Allocating Independent Subtasks on Parallel Processors
IEEE Transactions on Software Engineering
Processor Allocation for Horizontal and Vertical Parallelism and Related Speedup Bounds
IEEE Transactions on Computers
A data-flow approach to multitasking on CRAY X-MP computers
Proceedings of the tenth ACM symposium on Operating systems principles
Dependence graphs and compiler optimizations
POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Structure of Computers and Computations
Structure of Computers and Computations
Speedup of ordinary programs
Multiprocessors: discussion of some theoretical and practical problems
Multiprocessors: discussion of some theoretical and practical problems
Optimizing supercompilers for supercomputers
Optimizing supercompilers for supercomputers
On program restructuring, scheduling, and communication for parallel processor systems
On program restructuring, scheduling, and communication for parallel processor systems
Principles of Compiler Design (Addison-Wesley series in computer science and information processing)
Principles of Compiler Design (Addison-Wesley series in computer science and information processing)
The NYU Ultracomputer Designing an MIMD Shared Memory Parallel Computer
IEEE Transactions on Computers
Multiprocessor Scheduling with the Aid of Network Flow Algorithms
IEEE Transactions on Software Engineering
Bounds on multiprocessing anomalies and related packing algorithms
AFIPS '72 (Spring) Proceedings of the May 16-18, 1972, spring joint computer conference
Scheduling Multipipeline and Multiprocessor Computers
IEEE Transactions on Computers
Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design
IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
Partitioning programs for parallel execution
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Impact of self-scheduling order on performance on multiprocessor systems
ICS '88 Proceedings of the 2nd international conference on Supercomputing
On the combination of hardware and software concurrency extraction methods
ACM SIGMICRO Newsletter
Compiling issues for supercomputers
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
The fuzzy barrier: a mechanism for high speed synchronization of processors
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Run-time parallelization and scheduling of loops
SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
Utilizing Multidimensional Loop Parallelism on Large Scale Parallel Processor Systems
IEEE Transactions on Computers
Processor scheduling in shared memory multiprocessors
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Run-Time Parallelization and Scheduling of Loops
IEEE Transactions on Computers
Removal of redundant dependences in DOACROSS loops with constant dependences
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Exploitation of APL data parallelism on a shared-memory MIMD machine
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Switch-stacks: a scheme for microtasking nested parallel loops
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Dynamic Processor Self-Scheduling for General Parallel Nested Loops
IEEE Transactions on Computers
Factoring: a practical and robust method for scheduling parallel loops
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Factoring: a method for scheduling parallel loops
Communications of the ACM
Low-overhead scheduling of nested parallelism
IBM Journal of Research and Development
Automatic partitioning of a program dependence graph into parallel tasks
IBM Journal of Research and Development
A general framework for iteration-reordering loop transformations
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
A dynamic scheduling method for irregular parallel programs
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Processor allocation and loop scheduling on multiprocessor computers
ICS '92 Proceedings of the 6th international conference on Supercomputing
Using processor affinity in loop scheduling on shared-memory multiprocessors
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Chores: enhanced run-time support for shared-memory parallel computing
ACM Transactions on Computer Systems (TOCS)
Orchestrating interactions among parallel computations
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Managing pages in shared virtual memory systems: getting the compiler into the game
ICS '93 Proceedings of the 7th international conference on Supercomputing
Self-scheduling on distributed-memory machines
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Parallel programming with control abstraction
ACM Transactions on Programming Languages and Systems (TOPLAS)
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Combining static and dynamic scheduling on distributed-memory multiprocessors
ICS '94 Proceedings of the 8th international conference on Supercomputing
ICS '94 Proceedings of the 8th international conference on Supercomputing
Impact of sharing-based thread placement on multithreaded architectures
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The effectiveness of multiple hardware contexts
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
A Hierarchical Task Queue Organization for Shared-Memory Multiprocessor Systems
IEEE Transactions on Parallel and Distributed Systems
The CRAFT Fortran programming model
Scientific Programming
Balancing processor loads and exploiting data locality in N-body simulations
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Commutativity analysis: a new analysis framework for parallelizing compilers
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Symbolic analysis for parallelizing compilers
ACM Transactions on Programming Languages and Systems (TOPLAS)
Load-sharing in heterogeneous systems via weighted factoring
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
A template for non-uniform parallel loops based on dynamic scheduling and prefetching techniques
ICS '96 Proceedings of the 10th international conference on Supercomputing
Impact of Memory Contention on Dynamic Scheduling on NUMA Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Compiler techniques for data synchronization in nested parallel loops
ICS '90 Proceedings of the 4th international conference on Supercomputing
Parallelization of FORTRAN code on distributed-memory parallel processors
ICS '90 Proceedings of the 4th international conference on Supercomputing
On the combination of hardware and software concurrency extraction methods
MICRO 20 Proceedings of the 20th annual workshop on Microprogramming
Adaptively Scheduling Parallel Loops in Distributed Shared-Memory Systems
IEEE Transactions on Parallel and Distributed Systems
Space-efficient implementation of nested parallelism
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Compile-time minimisation of load imbalance in loop nests
ICS '97 Proceedings of the 11th international conference on Supercomputing
Commutativity analysis: a new analysis technique for parallelizing compilers
ACM Transactions on Programming Languages and Systems (TOPLAS)
Dynamic scheduling with incomplete information
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Application level scheduling of gene sequence comparison on metacomputers
ICS '98 Proceedings of the 12th international conference on Supercomputing
Dependence driven execution for multiprogrammed multiprocessor
ICS '98 Proceedings of the 12th international conference on Supercomputing
Scheduling policies to support distributed 3D multimedia applications
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
An Efficient Solution to the Cache Thrashing Problem Caused by True Data Sharing
IEEE Transactions on Computers
On Exploiting Task Duplication in Parallel Program Scheduling
IEEE Transactions on Parallel and Distributed Systems
SMARTS: exploiting temporal locality and parallelism through vertical execution
ICS '99 Proceedings of the 13th international conference on Supercomputing
Parallel Computing on an Ethernet Cluster of Workstations: Opportunities and Constraints
The Journal of Supercomputing
Eliminating synchronization overhead in automatically parallelized programs using dynamic feedback
ACM Transactions on Computer Systems (TOCS)
Space-efficient scheduling of nested parallelism
ACM Transactions on Programming Languages and Systems (TOPLAS)
ICS '89 Proceedings of the 3rd international conference on Supercomputing
An efficient message-passing scheduler based on guided self scheduling
ICS '89 Proceedings of the 3rd international conference on Supercomputing
The impact of synchronization and granularity on parallel systems
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Generating a deterministic task migration path for multiprocessor scheduling
SAC '94 Proceedings of the 1994 ACM symposium on Applied computing
SAC '94 Proceedings of the 1994 ACM symposium on Applied computing
Performance prediction based loop scheduling for heterogeneous computing environment
SAC '97 Proceedings of the 1997 ACM symposium on Applied computing
Cacheminer: A Runtime Approach to Exploit Cache Locality on SMP
IEEE Transactions on Parallel and Distributed Systems
Exploiting Wavefront Parallelism on Large-Scale Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Optimal tiling for minimizing communication in distributed shared-memory multiprocessors
Compiler optimizations for scalable parallel systems
Affinity scheduling of unbalanced workloads
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Runtime vs. Manual Data Distribution for Architecture-Agnostic Shared-Memory Programming Models
International Journal of Parallel Programming
Using Program Visualization for Tuning Parallel-Loop Scheduling
IEEE Parallel & Distributed Technology: Systems & Technology
Exploiting Parallelism Across Program Execution: A Unification Technique and Its Analysis
IEEE Transactions on Parallel and Distributed Systems
The Effect of Scheduling Discipline on Spin Overhead in Shared Memory Parallel Systems
IEEE Transactions on Parallel and Distributed Systems
Removal of Redundant Dependences in DOACROSS Loops with Constant Dependences
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Partitioning and Labeling of Loops by Unimodular Transformations
IEEE Transactions on Parallel and Distributed Systems
Synchronization and Communication Costs of Loop Partitioning on Shared-Memory Multiprocessor Systems
IEEE Transactions on Parallel and Distributed Systems
Trapezoid Self-Scheduling: A Practical Scheduling Scheme for Parallel Compilers
IEEE Transactions on Parallel and Distributed Systems
Dependence Uniformization: A Loop Parallelization Technique
IEEE Transactions on Parallel and Distributed Systems
Using Processor Affinity in Loop Scheduling on Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
The Impact of Parallel Loop Scheduling Strategies on Prefetching in a Shared Memory Multiprocessor
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Eliminating synchronization bottlenecks using adaptive replication
ACM Transactions on Programming Languages and Systems (TOPLAS)
Dynamic Scheduling Parallel Loops with Variable Iterate Execution Times
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Load Balancing Highly Irregular Computations with the Adaptive Factoring
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Performance of Scheduling Scientific Applications with Adaptive Weighted Factoring
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Optimized Execution of Fortran 90 Array Language on Symmetric Shared-Memory Multiprocessors
LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
A Theoretical Application of Feedback Guided Dynamic Loop Scheduling
IWCC '01 Proceedings of the NATO Advanced Research Workshop on Advanced Environments, Tools, and Applications for Cluster Computing-Revised Papers
Feedback Guided Scheduling of Nested Loops
PARA '00 Proceedings of the 5th International Workshop on Applied Parallel Computing, New Paradigms for HPC in Industry and Academia
IPDPS '00/JSSPP '00 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
A Semi-dynamic Multiprocessor Scheduling Algorithm with an Asymptotically Optimal Competitive Ratio
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Scheduling User-Level Threads on Distributed Shared-Memory Multiprocessors
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Multiple-Robot Motion Planning = Parallel Processing + Geometry
Revised Papers from the International Workshop on Sensor Based Intelligent Robots
Optimal, Distributed Decision-Making: The Case of No Communication
FCT '99 Proceedings of the 12th International Symposium on Fundamentals of Computation Theory
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Loop Transformations for Hierarchical Parallelism and Locality
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Scheduling at Twilight the Easy Way
STACS '02 Proceedings of the 19th Annual Symposium on Theoretical Aspects of Computer Science
Adaptive Computing on the Grid Using AppLeS
IEEE Transactions on Parallel and Distributed Systems
Automatic parallelization for symmetric shared-memory multiprocessors
CASCON '96 Proceedings of the 1996 conference of the Centre for Advanced Studies on Collaborative research
Customized dynamic load balancing for a network of workstations
HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Message-passing parallel adaptive quantum trajectory method
High performance scientific and engineering computing
A dynamic application-driven data communication strategy
Proceedings of the 18th annual international conference on Supercomputing
Optimizing Array-Intensive Applications for On-Chip Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Simulation of Vector Nonlinear Time Series Models on Clusters
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 13 - Volume 14
Shared memory multiprocessor support for functional array processing in SAC
Journal of Functional Programming
An Enhanced Parallel Loop Self-Scheduling Scheme for Cluster Environments
The Journal of Supercomputing
Design and implementation of a novel dynamic load balancing library for cluster computing
Parallel Computing - Heterogeneous computing
A Load Balancing Tool for Distributed Parallel Loops
Cluster Computing
Feedback guided dynamic loop scheduling: convergence of the continuous case
The Journal of Supercomputing - Special issue: Parallel and distributed processing and applications
A taxonomy of Data Grids for distributed data sharing, management, and processing
ACM Computing Surveys (CSUR)
PackageBLAST: an adaptive multi-policy grid service for biological sequence comparison
Proceedings of the 2006 ACM symposium on Applied computing
SPM Conscious Loop Scheduling for Embedded Chip Multiprocessors
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Large scale multiple sequence alignment with simultaneous phylogeny inference
Journal of Parallel and Distributed Computing
IEEE Transactions on Computers
Memory bank aware dynamic loop scheduling
Proceedings of the conference on Design, automation and test in Europe
A performance-based parallel loop scheduling on grid environments
The Journal of Supercomputing
Enhancing self-scheduling algorithms via synchronization and weighting
Journal of Parallel and Distributed Computing
Dynamic partitioning of loop iterations on heterogeneous PC clusters
The Journal of Supercomputing
Dynamic load balancing with adaptive factoring methods in scientific applications
The Journal of Supercomputing
Performance evaluation of a dynamic load-balancing library for cluster computing
International Journal of Computational Science and Engineering
A practical application of FGDLS to birds flock trajectory
ICCOMP'05 Proceedings of the 9th WSEAS International Conference on Computers
Future Generation Computer Systems
Chunking parallel loops in the presence of synchronization
Proceedings of the 23rd international conference on Supercomputing
Task distribution using factoring load balancing in Master--Worker applications
Information Processing Letters
Implementation of a Performance-Based Loop Scheduling on Heterogeneous Clusters
ICA3PP '09 Proceedings of the 9th International Conference on Algorithms and Architectures for Parallel Processing
A directive-based MPI code generator for Linux PC clusters
The Journal of Supercomputing
An adaptive multi-policy grid service for biological sequence comparison
Journal of Parallel and Distributed Computing
Structure-driven optimizations for amorphous data-parallel programs
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Parallelism in a multi-user environment
Parallel Computing
A parallel loop self-scheduling on extremely heterogeneous PC clusters
ICCS'03 Proceedings of the 2003 international conference on Computational science
Performance-based workload distribution on grid environments
GPC'07 Proceedings of the 2nd international conference on Advances in grid and pervasive computing
Enhanced loop coalescing: a compiler technique for transforming non-uniform iteration spaces
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Performance-based loop scheduling on grid environments
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Adaptive statistical scheduling of divisible workloads in heterogeneous systems
Journal of Scheduling
Integration of Heterogeneous and Non-dedicated Environments for R
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Ordered and unordered algorithms for parallel breadth first search
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Parallel inclusion-based points-to analysis
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Parallel programming with data driven model
EURO-PDP'00 Proceedings of the 8th Euromicro conference on Parallel and distributed processing
Parallel multiple sequence alignment with local phylogeny search by simulated annealing
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Simulation of a hybrid model for image denoising
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Dynamic multi phase scheduling for heterogeneous cluste
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
ICCSA'11 Proceedings of the 2011 international conference on Computational science and its applications - Volume Part I
Exploiting thread-data affinity in OpenMP with data access patterns
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
A parameter study of a hybrid Laplacian mean-curvature flow denoising model
The Journal of Supercomputing
Load and performance balancing scheme for heterogeneous parallel processing
CIS'04 Proceedings of the First international conference on Computational and Information Science
An efficient approach for self-scheduling parallel loops on multiprogrammed parallel computers
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
A new carried-dependence self-scheduling algorithm
ICCSA'05 Proceedings of the 2005 international conference on Computational Science and its Applications - Volume Part I
A performance-based parallel loop self-scheduling on grid computing environments
NPC'05 Proceedings of the 2005 IFIP international conference on Network and Parallel Computing
Scheduling divisible workloads using the adaptive time factoring algorithm
ICA3PP'05 Proceedings of the 6th international conference on Algorithms and Architectures for Parallel Processing
Convergence of the discrete FGDLS algorithm
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
A hybrid parallel loop scheduling scheme on grid environments
GCC'05 Proceedings of the 4th international conference on Grid and Cooperative Computing
A dynamic partitioning self-scheduling scheme for parallel loops on heterogeneous clusters
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part I
Performance-based parallel loop self-scheduling on heterogeneous multicore PC clusters
HPCA'09 Proceedings of the Second international conference on High Performance Computing and Applications
Scheduling bot applications in grids using a slave oriented adaptive algorithm
ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Online task scheduling on heterogeneous clusters: an experimental study
PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
An adaptive job allocation strategy for heterogeneous multi-cluster systems
GPC'10 Proceedings of the 5th international conference on Advances in Grid and Pervasive Computing
GPC'06 Proceedings of the First international conference on Advances in Grid and Pervasive Computing
Dynamic load balancing with MatlabMPI
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
Effective parallelization of loops in the presence of I/O operations
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Partitioning and scheduling loops on NOWs
Computer Communications
A self-adaptive computing framework for parallel maximum likelihood evaluation
The Journal of Supercomputing
Performance enhancement under power constraints using heterogeneous CMOS-TFET multicores
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Accelerating MapReduce on a coupled CPU-GPU architecture
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Using analytical models to load balancing in a heterogeneous network of computers
PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
Distributing fixed time slices in heterogeneous networks of workstations (NOWs)
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Concurrency and Computation: Practice & Experience
A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
The Journal of Supercomputing
A Transformation Framework for Optimizing Task-Parallel Programs
ACM Transactions on Programming Languages and Systems (TOPLAS)
Load balancing in a changing world: dealing with heterogeneity and performance variability
Proceedings of the ACM International Conference on Computing Frontiers
IWANN'13 Proceedings of the 12th international conference on Artificial Neural Networks: advences in computational intelligence - Volume Part II
Hi-index | 15.01 |
This paper proposes guided self-scheduling, a new approach for scheduling arbitrarily nested parallel program loops on shared memory multiprocessor systems. Utilizing loop parallelism is clearly most crucial in achieving high system and program performance. Because of its simplicity, guided self-scheduling is particularly suited for implementation on real parallel machines. This method achieves simultaneously the two most important objectives: load balancing and very low synchronization overhead. For certain types of loops we show analytically that guided self-scheduling uses minimal overhead and achieves optimal schedules. Two other interesting properties of this method are its insensitivity to the initial processor configuration (in time) and its parameterized nature which allows us to tune it for different systems. Finally we discuss experimental results that clearly show the advantage of guided self-scheduling over the most widely known dynamic methods.