LogP: towards a realistic model of parallel computation

Authors:
David Culler;Richard Karp;David Patterson;Abhijit Sahay;Klaus Erik Schauser;Eunice Santos;Ramesh Subramonian;Thorsten von Eicken
Affiliations:
-;-;-;-;-;-;-;-
Venue:
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Year:
1993

Citing 22
Cited 373

Randomized and deterministic simulations of PRAMs by parallel machines with restricted granularity of parallel memories

Acta Informatica
Type architectures, shared memory, and the corollary of modest potential

Annual review of computer science vol. 1, 1986
Towards an architecture-independent analysis of parallel algorithms

STOC '88 Proceedings of the twentieth annual ACM symposium on Theory of computing
On communication latency in PRAM computations

SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
A more practical PRAM model

SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
The APRAM: incorporating asynchrony into the PRAM model

SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
Efficient parallel algorithms can be made robust

Proceedings of the eighth annual ACM Symposium on Principles of distributed computing
Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
Communication complexity of PRAMs

Theoretical Computer Science - Special issue: Fifteenth international colloquium on automata, languages and programming, Tampere, Finland, July 1988
A bridging model for parallel computation

Communications of the ACM
Performance Analysis of k-ary n-cube Interconnection Networks

IEEE Transactions on Computers
Efficient robust parallel computations

STOC '90 Proceedings of the twenty-second annual ACM symposium on Theory of computing
Efficient PRAM simulation on a distributed memory machine

STOC '92 Proceedings of the twenty-fourth annual ACM symposium on Theory of computing
The Stanford Dash Multiprocessor

Computer
Ultracomputers: a teraflop before its time

Communications of the ACM
Active messages: a mechanism for integrated communication and computation

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Designing broadcasting algorithms in the postal model for message-passing systems

SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Monsoon: an explicit token-store architecture

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Parallelism in random access machines

STOC '78 Proceedings of the tenth annual ACM symposium on Theory of computing
LogP: Towards a Realistic Model of Parallel Computation

LogP: Towards a Realistic Model of Parallel Computation
Optimal Broadcast and Summation in the LogP Model

Optimal Broadcast and Summation in the LogP Model
Hiding Communication Costs in Bandwidth-Limited Parallel FFT

Hiding Communication Costs in Bandwidth-Limited Parallel FFT

Parallel algorithms column 1: models of computation

ACM SIGACT News
Optimal broadcast and summation in the LogP model

SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Contention in shared memory algorithms

STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
Modeling communication in parallel algorithms: a fruitful interaction between theory and systems?

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
An architecture for optimal all-to-all personalized communication

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Efficient algorithms for all-to-all communications in multi-port message-passing systems

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Maya: a simulation platform for distributed shared memories

PADS '94 Proceedings of the eighth workshop on Parallel and distributed simulation
An approach to scalability study of shared memory parallel systems

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Trade-offs between communication throughput and parallel time

STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
Where is time spent in message-passing and shared-memory programs?

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
A performance evaluation of lock-free synchronization protocols

PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers

IEEE Transactions on Parallel and Distributed Systems
High-level optimization via automated statistical modeling

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Parallel algorithms for image histogramming and connected components with an experimental study (extended abstract)

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Performance predictions for parallel diagonal-implicitly iterated Runge-Kutta methods

PADS '95 Proceedings of the ninth workshop on Parallel and distributed simulation
A randomized parallel 3D convex hull algorithm for coarse grained multicomputers

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Accounting for memory bank contention and delay in high-bandwidth multiprocessors

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Parallel sorting with limited bandwidth

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Universal congestion control for meshes

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
The interaction of parallel and sequential workloads on a network of workstations

Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Parallelism in sequential functional languages

FPCA '95 Proceedings of the seventh international conference on Functional programming languages and computer architecture
Upper time bounds for executing PRAM-programs on the LogP-machine

ICS '95 Proceedings of the 9th international conference on Supercomputing
On the Design and Implementation of Broadcast and Global Combine Operations Using the Postal Model

IEEE Transactions on Parallel and Distributed Systems
Practical parallel algorithms for personalized communication and integer sorting

Journal of Experimental Algorithmics (JEA)
Evaluation of architectural support for global address-based communication in large-scale parallel machines

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Towards efficiency and portability: programming with the BSP model

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
BSP vs LogP

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Asynchronous shared memory search structures

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
On multiprocessor system scheduling

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Parallel algorithms for personalized communication and sorting with an experimental study (extended abstract)

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Synchronization hardware for networks of workstations: performance vs. cost

ICS '96 Proceedings of the 10th international conference on Supercomputing
Communication-efficient parallel sorting (preliminary version)

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
LogP: a practical model of parallel computation

Communications of the ACM
Fast Parallel Sorting Under LogP: Experience with the CM-5

IEEE Transactions on Parallel and Distributed Systems
The Block Distributed Memory Model

IEEE Transactions on Parallel and Distributed Systems
A quantitative comparison of parallel computation models

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
A bridging model for parallel computation, communication, and I/O

ACM Computing Surveys (CSUR) - Special issue: position statements on strategic directions in computing research
Can shared-memory model serve as a bridging model for parallel computation?

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Modeling parallel bandwidth: local vs. global restrictions

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Triplex: a multi-class routing algorithm

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Fine-grain multithreading with the EM-X multiprocessor

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Performance considerations in software multicasts

ICS '97 Proceedings of the 11th international conference on Supercomputing
Performance implications of communication mechanisms in all-software global address space systems

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
LoPC: modeling contention in parallel algorithms

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Effects of communication latency, overhead, and bandwidth in a cluster architecture

Proceedings of the 24th annual international symposium on Computer architecture
Accounting for Memory Bank Contention and Delay in High-Bandwidth Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Efficient Algorithms for the Reduce-Scatter Operation in LogGP

IEEE Transactions on Parallel and Distributed Systems
Exploiting local data in parallel array I/O on a practical network of workstations

Proceedings of the fifth workshop on I/O in parallel and distributed systems
Maintaining dynamic geometric objects on parallel processors

PRS '97 Proceedings of the IEEE symposium on Parallel rendering
Generating an Efficient Broadcast Sequence Using Reflected Gray Codes

IEEE Transactions on Parallel and Distributed Systems
Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems

IEEE Transactions on Parallel and Distributed Systems
Abstractions for Portable, Scalable Parallel Programming

IEEE Transactions on Parallel and Distributed Systems
Communication-optimal parallel minimum spanning tree algorithms (extended abstract)

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Computational bounds for fundamental problems on general-purpose parallel models

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Load balanced parallel radix sort

ICS '98 Proceedings of the 12th international conference on Supercomputing
Scheduling with implicit information in distributed systems

SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
LoGPC: modeling network contention in message-passing programs

SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Support for Efficient Programming on the SB-PRAM

International Journal of Parallel Programming
Models and languages for parallel computation

ACM Computing Surveys (CSUR)
A quantitative comparison of parallel computation models

ACM Transactions on Computer Systems (TOCS)
A new deterministic parallel sorting algorithm with an experimental evaluation

Journal of Experimental Algorithmics (JEA)
A Gracefully Degrading Massively Parallel System Using the BSP Model, and Its Evaluation

IEEE Transactions on Computers
An Application-Driven Study of Parallel System Overheads and Network Bandwidth Requirements

IEEE Transactions on Parallel and Distributed Systems
Scaling application performance on a cache-coherent multiprocessor

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
MagPIe: MPI's collective communication operations for clustered wide area systems

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Predictive analysis of a wavefront application using LogGP

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
NFS sensitivity to high performance networks

SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Resource Scaling Effects on MPP Performance: The STAP Benchmark Implications

IEEE Transactions on Parallel and Distributed Systems
Problem space promotion and its evaluation as a technique for efficient parallel computation

ICS '99 Proceedings of the 13th international conference on Supercomputing
BOS is boss: a case for bulk-synchronous object systems

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Preemptive scheduling of parallel jobs on multiprocessors

Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
Randomized fully-scalable BSP techniques for multi-searching and convex hull construction

SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
The QRQW PRAM: accounting for contention in parallel algorithms

SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Emulations between QSM, BSP, and LogP: a framework for general-purpose parallel algorithm design

Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
On the partitionability of hierarchical radiosity

PVGS '99 Proceedings of the 1999 IEEE symposium on Parallel visualization and graphics
Fault-Tolerant Communication Algorithms in Toroidal Networks

IEEE Transactions on Parallel and Distributed Systems
Portable and Efficient Parallel Computing Using the BSP Model

IEEE Transactions on Computers
Coarse grained parallel computing on heterogeneous systems

SAC '98 Proceedings of the 1998 ACM symposium on Applied Computing
Improved parallel and sequential walking tree methods for biological string alignments

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Toward Formally-Based Design of Message Passing Programs

IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools for parallel processing
A Design Methodology for Data-Parallel Applications

IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools parallel processing
A Transformation Approach to Derive Efficient Parallel Implementations

IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools parallel processing
Scatter and gather operations on an asynchronous communication model

SAC '00 Proceedings of the 2000 ACM symposium on Applied computing - Volume 2
A formal model for the parallel semantics of P3L

SAC '00 Proceedings of the 2000 ACM symposium on Applied computing - Volume 2
An Estimation of Complexity and Computational Costs for Vertical Block-Cyclic Distributed Parallel LU Factorization

The Journal of Supercomputing
Profiling a parallel language based on fine-grained communication

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Automatically tuned collective communications

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
H-BSP: A Hierarchical BSP Computation Model

The Journal of Supercomputing
Parallelizing the Murϕ Verifier

Formal Methods in System Design - Special issue on CAV '97
LoGPC: Modeling Network Contention in Message-Passing Programs

IEEE Transactions on Parallel and Distributed Systems
Towards practical deteministic write-all algorithms

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
LogGPS: a parallel computational model for synchronization analysis

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Tolerating communication latency through dynamic thread invocation in a multithreaded architecture

Compiler optimizations for scalable parallel systems
Implicit coscheduling: coordinated scheduling with implicit information in distributed systems

ACM Transactions on Computer Systems (TOCS)
SimpleFit: A Framework for Analyzing Design Trade-Offs in Raw Architectures

IEEE Transactions on Parallel and Distributed Systems
An implementation and analysis of the virtual interface architecture

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Communication overlap in multi-tier parallel algorithms

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Multi-protocol active messages on a cluster of SMP's

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Performance prediction for random write reductions: a case study in modeling shared memory programs

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Near-optimal adaptive control of a large grid application

ICS '02 Proceedings of the 16th international conference on Supercomputing
P-3PC: A Point-to-Point Communication Model for Automatic and Optimal Decomposition of Regular Domain Problems

IEEE Transactions on Parallel and Distributed Systems
Evaluating the running time of a communication round over the internet

Proceedings of the twenty-first annual symposium on Principles of distributed computing
Use of a CORBA/RMI gateway: characterization of communication overhead

WOSP '02 Proceedings of the 3rd international workshop on Software and performance
On scheduling send-graphs and receive-graphs under the LogP-model

Information Processing Letters
Parallel FFT on ATM-based networks of workstations

Cluster Computing
Exploiting Locality in Single Assignment Data Structures Updated Through Split-Phase Transactions

Cluster Computing
Architectural differences of efficient sequential and parallel computers

Journal of Systems Architecture: the EUROMICRO Journal
An Application-Driven Study of Multicast Communication for Write Invalidation

The Journal of Supercomputing
Adaptive parallel computing on heterogeneous networks with mpC

Parallel Computing
Performance-steered design of software architectures for embedded multicore systems

Software—Practice & Experience
A software architecture for user transparent parallel image processing

Parallel Computing - Parallel computing in image and video processing
Modeling Communication Overhead: MPI and MPL Performance on the IBM SP2

IEEE Parallel & Distributed Technology: Systems & Technology
Collective Communication in Wormhole-Routed Massively Parallel Computers

Computer
A Case for NOW (Networks of Workstations)

IEEE Micro
Assessing Fast Network Interfaces

IEEE Micro
Computing Global Combine Operations in the Multiport Postal Model

IEEE Transactions on Parallel and Distributed Systems
POEMS: End-to-End Performance Design of Large Parallel Adaptive Computational Systems

IEEE Transactions on Software Engineering
Symbolic Performance Modeling of Parallel Systems

IEEE Transactions on Parallel and Distributed Systems
Improving the Throughput of Remote Storage Access through Pipelining

GRID '02 Proceedings of the Third International Workshop on Grid Computing
Towards an Accurate Model for Collective Communications

ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Fault Tolerant MPI for the HARNESS Meta-computing System

ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Parallel Models and Job Characterization for System Scheduling

ICCS '01 Proceedings of the International Conference on Computational Science-Part II
Parallel Algorithm Design with Coarse-Grained Synchronization

ICCS '01 Proceedings of the International Conference on Computational Science-Part II
An Analytical Model for a Class of Architectures under Master-Slave Paradigm

HPCN Europe 2000 Proceedings of the 8th International Conference on High-Performance Computing and Networking
Skel-BSP: Performance Portability for Skeletal Programming

HPCN Europe 2000 Proceedings of the 8th International Conference on High-Performance Computing and Networking
Source Code and Task Graphs in Program Optimization

HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
ECO: Efficient Collective Operations for Communication on Heterogeneous Networks

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Parallel Implementation of Borvka's Minimum Spanning Tree Algorithm

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
A Randomized Algorithm for Voronoi Diagram of Line Segments on Coarse-Grained Multiprocessors

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Parallel 'Go with the Winners' Algorithms in the LogP Model

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
A 2-D Parallel Convex Hull Algorithm with Optimal Communication Phases

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Optimizing Parallel Bitonic Sort

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Architecture-Dependent Tuning of the Parameterized Communication Model for Optimal Multicasting

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
A Parallel LCC Simulation System

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Exploiting Hierarchy in Heterogeneous Environments

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Predicting Scalability of Parallel Garbage Collectors on Shared Memory Multiprocessors

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
On the Design of Clustering-based Scheduling Algorithms for Realistic Machine Models

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Fast Measurement of LogP Parameters for Message Passing Platforms

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
A Hardware Implementation of PRAM and Its Performance Evaluation

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
VIBe: A Micro-benchmark Suite for Evaluating Virtual Interface Architecture (VIA) Implementations

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Cost Hierarchies for Abstract Parallel Machines

LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
kappa NUMA: A Model for Clusters of SMP-Machines

PPAM '01 Proceedings of the th International Conference on Parallel Processing and Applied Mathematics-Revised Papers
Packet Bundling

SWAT '02 Proceedings of the 8th Scandinavian Workshop on Algorithm Theory
RP*: A Family of Order Preserving Scalable Distributed Data Structures

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Implementation and Analysis of a Parallel Collection Query Language

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Asynchronous Random Polling Dynamic Load Balancing

ISAAC '99 Proceedings of the 10th International Symposium on Algorithms and Computation
Improving Parallel Job Scheduling Using Runtime Measurements

IPDPS '00/JSSPP '00 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
On Minimising the Processor Requirements of LogP Schedules

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
VizzScheduler - A Framework for the Visualization of Scheduling Algorithms

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Symbolic Cost Estimation of Parallel Applications

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Optimal Scheduling Algorithms for Communication Constrained Parallel Processing

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
On Scheduling Task-Graphs to LogP-Machines with Disturbances

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Scheduling Arbitrary Task Graphs on LogP Machines

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Scheduling Iterative Programs onto LogP-Machine

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Parallel Image Processing System on a Cluster of Personal Computers (Best Student Paper Award: First Prize)

VECPAR '00 Selected Papers and Invited Talks from the 4th International Conference on Vector and Parallel Processing
Graph Coloring on a Coarse Grained Multiprocessor

WG '00 Proceedings of the 26th International Workshop on Graph-Theoretic Concepts in Computer Science
Handling Graphs According to a Coarse Grained Approach: Experiments with PVM and MPI

Proceedings of the 7th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Validation of Dimemas Communication Model for MPI Collective Operations

Proceedings of the 7th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
ACCT: Automatic Collective Communications Tuning

Proceedings of the 7th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Adaptive Execution of Pipelines

Proceedings of the 8th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Architecture Independent Analysis of Parallel Programs

ICCS '01 Proceedings of the International Conference on Computational Science-Part II
A Customizable Simulator for Workstation Networks

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
A Cost Model for Asynchronous and Structured Message Passing

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Towards the automatic optimal mapping of pipeline algorithms

Parallel Computing
HARNESS fault tolerant MPI design, usage and performance issues

Future Generation Computer Systems - Grid computing: Towards a new computing infrastructure
EPPP - an integrated environment for portable parallel programming

CASCON '94 Proceedings of the 1994 conference of the Centre for Advanced Studies on Collaborative research
Automatic decomposition in EPPP compiler

CASCON '94 Proceedings of the 1994 conference of the Centre for Advanced Studies on Collaborative research
High performance RDMA-based MPI implementation over InfiniBand

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
A3: a simple and asymptotically accurate model for parallel computation

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Abstracting network characteristics and locality properties of parallel systems

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Evaluation of Various Node Configurations for Fine-grain Multithreading on Stock Processors

HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
On finding optimal clusterings of task graphs

PAS '95 Proceedings of the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis
Algorithm engineering for parallel computation

Experimental algorithmics
MPICH-G2: a Grid-enabled implementation of the Message Passing Interface

Journal of Parallel and Distributed Computing - Special issue on computational grids
A high-performance communication service for parallel computing on distributed DSP systems

Parallel Computing
Experimental Validation of Parallel Computation Models on the Intel Paragon

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Predicting the Running Times of Parallel Programs by Simulation

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Managing Concurrent Access for Shared Memory Active Messages

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Latency, Occupancy, and Bandwidth in DSM Multiprocessors: A Performance Evaluation

IEEE Transactions on Computers
Contention-Aware Communication Schedule for High-Speed Communication

Cluster Computing
On the elusive benefits of protocol offload

NICELI '03 Proceedings of the ACM SIGCOMM workshop on Network-I/O convergence: experience, lessons, implications
Graph coloring on coarse grained multicomputers

Discrete Applied Mathematics - Special issue: The second international colloquium, "journées de l'informatique messine"
Parallel 'go with the winners' algorithms in distributed memory models

Journal of Parallel and Distributed Computing - Special section best papers from the 2002 international parallel and distributed processing symposium
Efficient implementation of reduce-scatter in MPI

Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Parallel, distributed and network-based processing
Incorporating memory layout in the modeling of message passing programs

Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Parallel, distributed and network-based processing
Data-parallel polygonization

Parallel Computing - Special issue: High performance computing with geographical data
Optimal broadcast on parallel locality models

Journal of Discrete Algorithms
Emulations between QSM, BSP and LogP: a framework for general-purpose parallel algorithm design

Journal of Parallel and Distributed Computing
Parallel program performance prediction using deterministic task graph analysis

ACM Transactions on Computer Systems (TOCS)
Performance modelling for task-parallel programs

Performance analysis and grid computing
Quantification of memory communication

High performance scientific and engineering computing
Opportunities and challenges in application-tuned circuits and architectures based on nanodevices

Proceedings of the 1st conference on Computing frontiers
Dynamic TCP acknowledgment in the LogP model

Journal of Algorithms
A tool for the design and evaluation of hybrid scheduling algorithms for computational grids

MGC '04 Proceedings of the 2nd workshop on Middleware for grid computing
Predicting the Performance of Synchronous Discrete Event Simulation

IEEE Transactions on Parallel and Distributed Systems
Configuration of distributed message converter systems

Performance Evaluation
Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Scalable NIC-based Reduction on Large-scale Clusters

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Coarse grained gather and scatter operations with applications

Journal of Parallel and Distributed Computing
On performance analysis of heterogeneous parallel algorithms

Parallel Computing
On searching strategies, parallel questions, and delayed answers

Discrete Applied Mathematics - Fun with algorithms 2 (FUN 2001)
A Hardware Acceleration Unit for MPI Queue Processing

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Enhancing NIC Performance for MPI using Processing-in-Memory

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 10
System Support to Balance the Resource Supply and Demand in High-end Computing

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
A Method for MPI Broadcast in Computational Grids

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 13 - Volume 14
Stream PRAM

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 14 - Volume 15
Performance Analysis of MPI Collective Operations

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 15 - Volume 16
Pace--A Toolset for the Performance Prediction of Parallel and Distributed Systems

International Journal of High Performance Computing Applications
A tight analysis and near-optimal instances of the algorithm of Anderson and Woll

Theoretical Computer Science
A Coarse-Grained multicomputer algorithm for the detection of repetitions

Information Processing Letters
Communication Contention in Task Scheduling

IEEE Transactions on Parallel and Distributed Systems
Broadcasting on networks of workstations

Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures
The design and implementation of LilyTask in shared memory

ACM SIGOPS Operating Systems Review
A methodology for detailed performance modeling of reduction computations on SMP machines

Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
Towards an Accurate Model for Collective Communications

International Journal of High Performance Computing Applications
OS support for a commodity database on PC clusters: distributed devices vs. distributed file systems

ADC '05 Proceedings of the 16th Australasian database conference - Volume 39
Making the Most Out of Direct-Access Network Attached Storage

FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
HUNTing the Overlap

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
The MHETA Execution Model for Heterogeneous Clusters

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Cross-Platform Performance Prediction of Parallel Applications Using Partial Execution

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Performance Modeling and Tuning Strategies of Mixed Mode Collective Communications

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Toward a Realistic Task Scheduling Model

IEEE Transactions on Parallel and Distributed Systems
High performance RDMA-based MPI implementation over infiniBand

International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
MOLAR: adaptive runtime support for high-end computing operating and runtime systems

ACM SIGOPS Operating Systems Review
A detailed MPI communication model for distributed systems

Future Generation Computer Systems
The master-slave paradigm on heterogeneous systems: a dynamic programming approach for the optimal mapping

Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Parallel, distributed and network-based processing
Self-adapting numerical software (SANS) effort

IBM Journal of Research and Development
PEMPIs: a new methodology of modeling and prediction of MPI programs performance

International Journal of Parallel Programming
Modeling instruction placement on a spatial architecture

Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
The cache complexity of multithreaded cache oblivious algorithms

Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
A Parallel Independent Component Analysis Algorithm

ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Performance Modeling of Communication and Computation in Hybrid MPI and OpenMP Applications

ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 2
An Accurate Communication Model of a Heterogeneous Cluster Based on a Switch-Enabled Ethernet Network

ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 2
A 5/4-approximation algorithm for scheduling identical malleable tasks

Theoretical Computer Science - Approximation and online algorithms
Scalable, fault tolerant membership for MPI tasks on HPC systems

Proceedings of the 20th annual international conference on Supercomputing
MPIPP: an automatic profile-guided parallel process placement toolset for SMP clusters and multiclusters

Proceedings of the 20th annual international conference on Supercomputing
A Parallel Computational Model for Heterogeneous Clusters

IEEE Transactions on Parallel and Distributed Systems
Performance modeling and optimization of a high energy colliding beam simulation code

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Performance evaluation of high-speed interconnects using dense communication patterns

Parallel Computing
Parallel sparse LU factorization on different message passing platforms

Journal of Parallel and Distributed Computing
A grid-enabled distributed branch-and-bound algorithm with application on the Steiner problem in graphs

Parallel Computing - Optimization on grids - Optimization for grids
Optimizing communication overlap for high-speed networks

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Asynchronous resource discovery in peer-to-peer networks

Computer Networks: The International Journal of Computer and Telecommunications Networking
The design and development of ZPL

Proceedings of the third ACM SIGPLAN conference on History of programming languages
Performance engineering, PSEs and the GRID

Scientific Programming
Remote memory access: A case for portable, efficient and library independent parallel programming

Scientific Programming
Performance analysis of MPI collective operations

Cluster Computing
An efficient MPI_allgather for grids

Proceedings of the 16th international symposium on High performance distributed computing
High performance combinatorial algorithm design on the Cell Broadband Engine processor

Parallel Computing
PRO: a model for the design and analysis of efficient and scalable parallel algorithms

Nordic Journal of Computing
RAT: a methodology for predicting performance in application design migration to FPGAs

HPRCTA '07 Proceedings of the 1st international workshop on High-performance reconfigurable computing technology and applications: held in conjunction with SC07
An Evaluation of Marenostrum Performance

International Journal of High Performance Computing Applications
Performance without pain = productivity: data layout and collective communication in UPC

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
NIC-based reduction algorithms for large-scale clusters

International Journal of High Performance Computing and Networking
Performance evaluation of the Sun Fire Link SMP clusters

International Journal of High Performance Computing and Networking
HPM: a hierarchical model for parallel computations

International Journal of High Performance Computing and Networking
Foundations for the integration of scheduling techniques into compilers for parallel languages

International Journal of Computational Science and Engineering
Distributed computing and the multicore revolution

ACM SIGACT News
Implementation and performance analysis of non-blocking collective operations for MPI

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Fpga-based prototype of a pram-on-chip processor

Proceedings of the 5th conference on Computing frontiers
Techniques for pipelined broadcast on ethernet switched clusters

Journal of Parallel and Distributed Computing
Performance portable optimizations for loops containing communication operations

Proceedings of the 22nd annual international conference on Supercomputing
Optimal speedup on a low-degree multi-core parallel architecture (LoPRAM)

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Fundamental parallel algorithms for private-cache chip multiprocessors

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Cache-efficient dynamic programming algorithms for multicores

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Overcoming the processor communication overhead in MPI applications

SpringSim '07 Proceedings of the 2007 spring simulation multiconference - Volume 2
A pilot study to compare programming effort for two parallel programming models

Journal of Systems and Software
A Software Tool for Accurate Estimation of Parameters of Heterogeneous Communication Models

Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Automatic Communication Performance Debugging in PGAS Languages

Languages and Compilers for Parallel Computing
Efficient shared memory and RDMA based collectives on multi-rail QsNetII SMP clusters

Cluster Computing
RAT: RC Amenability Test for Rapid Performance Prediction

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
A unified model for multicore architectures

IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
On parallel image retrieval with dynamically extracted features

Parallel Computing
On modeling and evaluating multicomputer transcoding architectures for live-video streams

Multimedia Tools and Applications
Boolean circuit programming: A new paradigm to design parallel algorithms

Journal of Discrete Algorithms
Global Principal Typing in Partially Commutative Asynchronous Sessions

ESOP '09 Proceedings of the 18th European Symposium on Programming Languages and Systems: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
WARPP: a toolkit for simulating high-performance parallel scientific codes

Proceedings of the 2nd International Conference on Simulation Tools and Techniques
Accurate and Efficient Estimation of Parameters of Heterogeneous Communication Performance Models

International Journal of High Performance Computing Applications
Extracting and predicting the communication behaviour of parallel applications

International Journal of Parallel, Emergent and Distributed Systems
A domain-specific approach for software development on Manycore platforms

ACM SIGARCH Computer Architecture News
Type-Directed Compilation for Multicore Programming

Electronic Notes in Theoretical Computer Science (ENTCS)
Developing parallel programs: A design-oriented perspective

IWMSE '09 Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering
What the parallel-processing community has (failed) to offer the multi/many-core generation

Journal of Parallel and Distributed Computing
Evaluating multicore algorithms on the unified memory model

Scientific Programming - Software Development for Multi-core Computing Systems
Modeling advanced collective communication algorithms on cell-based systems

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Optimum Broadcasting in Complete Weighted-Vertex Graphs

SOFSEM '10 Proceedings of the 36th Conference on Current Trends in Theory and Practice of Computer Science
Graph coloring on coarse grained multicomputers

Discrete Applied Mathematics
An Adaptive High-Performance Service Architecture

Electronic Notes in Theoretical Computer Science (ENTCS)
A detailed MPI communication model for distributed systems

Future Generation Computer Systems
A Coarse-Grained Multicomputer algorithm for the detection of repetitions

Information Processing Letters
An efficient collective communication method for grid scale networks

ICCS'03 Proceedings of the 2003 international conference on Computational science
Model for simulation of heterogeneous high-performance computing environments

VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
A coarse-grained multicomputer algorithm for the longest repeated suffix ending at each point in a word

ICCSA'03 Proceedings of the 2003 international conference on Computational science and its applications: PartII
Modeling multigrain parallelism on heterogeneous multi-core processors: a case study of the cell BE

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Algorithmic approach to designing an easy-to-program system: Can it lead to a HW-enhanced programmer's workflow add-on?

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Low depth cache-oblivious algorithms

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Low-contention data structures

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Manycore performance analysis using timed configuration graphs

SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
Throughput optimal total order broadcast for cluster environments

ACM Transactions on Computer Systems (TOCS)
mPlogP: A Parallel Computation Model for Heterogeneous Multi-core Computer

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
LogGOPSim: simulating large-scale applications in the LogGOPS model

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
The LogP and MLogP models for parallel image processing with multi-core microprocessor

Proceedings of the 2010 Symposium on Information and Communication Technology
Optimizing collective communication on multicores

HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Design principles for end-to-end multicore schedulers

HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Estimating parallel performance, a skeleton-based approach

Proceedings of the fourth international workshop on High-level parallel programming and applications
A model of computation for MapReduce

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Revisiting Cramer's rule for solving dense linear systems

SpringSim '10 Proceedings of the 2010 Spring Simulation Multiconference
Parallel computation: models and complexity issues

Algorithms and theory of computation handbook
Incorporating memory layout in the modeling of message passing programs

EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
Efficient implementation of reduce-scatter in MPI

EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
Predictability of bulk synchronous programs using MPI

EURO-PDP'00 Proceedings of the 8th Euromicro conference on Parallel and distributed processing
Towards efficient BSP implementations of BSR programs for some computational geometry problems

EURO-PDP'00 Proceedings of the 8th Euromicro conference on Parallel and distributed processing
Fast barrier synchronization for InfiniBand™

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
LogfP - a model for small messages in InfiniBand

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A performance model for fine-grain accesses in UPC

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
An economy-driven mapping heuristic for hierarchical master-slave applications in grid systems

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Reining in the outliers in map-reduce clusters using Mantri

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Performance comparison of some shared memory organizations for 2D mesh-like NOCs

Microprocessors & Microsystems
Making the most out of direct-access network attached storage

FAST'03 Proceedings of the 2nd USENIX conference on File and storage technologies
Reliable performance prediction for multigrid software on distributed memory systems

Advances in Engineering Software
Parallel evaluation of conjunctive queries

Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Scheduling irregular parallel computations on hierarchical caches

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
An idiom-finding tool for increasing productivity of accelerators

Proceedings of the international conference on Supercomputing
An analytical model for multilevel performance prediction of Multi-FPGA systems

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Balance principles for algorithm-architecture co-design

HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Performance modeling for multilevel communication in SHMEM+

Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
A high performance superpipeline protocol for infiniband

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Cache size in a cost model for heterogeneous skeletons

Proceedings of the fifth international workshop on High-level parallel programming and applications
A framework for an automatic hybrid MPI+OpenMP code generation

Proceedings of the 19th High Performance Computing Symposia
Performance modeling for systematic performance tuning

State of the Practice Reports
Improving communication performance in dense linear algebra via topology aware collectives

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Collective communication costs analysis over gigabit ethernet and infiniband

HiPC'06 Proceedings of the 13th international conference on High Performance Computing
A case for non-blocking collective operations

ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
Optimizing explicit data transfers for data parallel applications on the cell architecture

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Measuring MPI send and receive overhead and application availability in high performance network interfaces

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Challenges and issues in benchmarking MPI

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
An efficient collective communication method using a shortest path algorithm in a computational grid

GCC'05 Proceedings of the 4th international conference on Grid and Cooperative Computing
Design of the force field task assignment method and associated performance evaluation for desktop grids

GCC'05 Proceedings of the 4th international conference on Grid and Cooperative Computing
A network evaluation for LAN, MAN and WAN grid environments

EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
A CGM algorithm solving the longest increasing subsequence problem

ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part V
PerWiz: a what-if prediction tool for tuning message passing programs

VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
Message strip-mining heuristics for high speed networks

VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
Adapting distributed scientific applications to run-time network conditions

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Improving multilevel approach for optimizing collective communications in computational grids

EGC'05 Proceedings of the 2005 European conference on Advances in Grid Computing
Modeling execution time of selected computation and communication kernels on grids

EGC'05 Proceedings of the 2005 European conference on Advances in Grid Computing
Reconfigurable scientific applications on GRID services

EGC'05 Proceedings of the 2005 European conference on Advances in Grid Computing
A two-phase scheduling algorithm for efficient collective communications of MPICH-G2

ICDCIT'05 Proceedings of the Second international conference on Distributed Computing and Internet Technology
Low-contention data structures

Journal of Parallel and Distributed Computing
SGL: towards a bridging model for heterogeneous hierarchical platforms

International Journal of High Performance Computing and Networking
Parallel short range molecular dynamics simulations on computer clusters: Performance evaluation and modeling

Mathematical and Computer Modelling: An International Journal
Performance Modeling and Comparative Analysis of the MILC Lattice QCD Application su3_rmd

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Scalable Multi-purpose Network Representation for Large Scale Distributed System Simulation

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
High-performance RMA-based broadcast on the intel SCC

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
SEParAT: scheduling support environment for parallel application task graphs

Cluster Computing
Aspen: a domain specific language for performance modeling

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Communication avoiding and overlapping for numerical linear algebra

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A scheduling toolkit for multiprocessor-task programming with dependencies

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
A new memory slowdown model for the characterization of computing systems

PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
Techniques for designing efficient parallel graph algorithms for SMPs and multicore processors

ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Optimization of collective communications in HeteroMPI

PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Netgauge: a network performance measurement framework

HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Towards a complexity model for design and analysis of PGAS-based algorithms

HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
High performance checksum computation for fault-tolerant MPI over infiniband

EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Performance modelling of magnetohydrodynamics codes

EPEW'12 Proceedings of the 9th European conference on Computer Performance Engineering
Performance modelling of magnetohydrodynamics codes

EPEW'12 Proceedings of the 9th European conference on Computer Performance Engineering
Modeling communication in cache-coherent SMP systems: a case-study with Xeon Phi

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Estimating parallel performance

Journal of Parallel and Distributed Computing
A survey of pipelined workflow scheduling: Models and algorithms

ACM Computing Surveys (CSUR)
Modeling synthetic aperture radar computation with Aspen

International Journal of High Performance Computing Applications
On the validity of flow-level tcp network models for grid and cloud simulations

ACM Transactions on Modeling and Computer Simulation (TOMACS)
A memory access model for highly-threaded many-core architectures

Future Generation Computer Systems
The Servet 3.0 benchmark suite: Characterization of network performance degradation

Computers and Electrical Engineering

Quantified Score

Hi-index	0.03

Visualization

Abstract

A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding development of techniques that yield performance across a range of current and future parallel machines. This paper offers a new parallel machine model, called LogP, that reflects the critical technology trends underlying parallel computers. it is intended to serve as a basis for developing fast, portable parallel algorithms and to offer guidelines to machine designers. Such a model must strike a balance between detail and simplicity in order to reveal important bottlenecks without making analysis of interesting problems intractable. The model is based on four parameters that specify abstractly the computing bandwidth, the communication bandwidth, the communication delay, and the efficiency of coupling communication and computation. Portable parallel algorithms typically adapt to the machine configuration, in terms of these parameters. The utility of the model is demonstrated through examples that are implemented on the CM-5.