LogP: a practical model of parallel computation

Authors:
David E. Culler;Richard M. Karp;David Patterson;Abhijit Sahay;Eunice E. Santos;Klaus Erik Schauser;Ramesh Subramonian;Thorsten von Eicken
Affiliations:
University of California, Berkeley;University of Washington, Seattle;University of California, Berkeley;Iris Financial Engineering and Systems, Inc.;Electrical Engineering and Computer Science Department, Lehigh University;Computer Science Department, University of California, Sanat Barbara;Lockheed Corp.;Computer Science Department, Cornell University
Venue:
Communications of the ACM
Year:
1996

Citing 15
Cited 90

Randomized and deterministic simulations of PRAMs by parallel machines with restricted granularity of parallel memories

Acta Informatica
Type architectures, shared memory, and the corollary of modest potential

Annual review of computer science vol. 1, 1986
Towards an architecture-independent analysis of parallel algorithms

STOC '88 Proceedings of the twentieth annual ACM symposium on Theory of computing
A more practical PRAM model

SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
Communication complexity of PRAMs

Theoretical Computer Science - Special issue: Fifteenth international colloquium on automata, languages and programming, Tampere, Finland, July 1988
A bridging model for parallel computation

Communications of the ACM
A comparison of sorting algorithms for the connection machine CM-2

SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
Introduction to parallel algorithms and architectures: array, trees, hypercubes

Introduction to parallel algorithms and architectures: array, trees, hypercubes
Efficient PRAM simulation on a distributed memory machine

STOC '92 Proceedings of the twenty-fourth annual ACM symposium on Theory of computing
LogP: towards a realistic model of parallel computation

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimal broadcast and summation in the LogP model

SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Fast Parallel Sorting Under LogP: Experience with the CM-5

IEEE Transactions on Parallel and Distributed Systems
Assessing Fast Network Interfaces

IEEE Micro
Parallelism in random access machines

STOC '78 Proceedings of the tenth annual ACM symposium on Theory of computing
Solving triangular linear systems in parallel using substitution

SPDP '95 Proceedings of the 7th IEEE Symposium on Parallel and Distributeed Processing

Parallel computation still not ready for the mainstream

Communications of the ACM
From algorithm parallelism to instruction-level parallelism: an encode-decode chain using prefix-sum

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Contention in shared memory algorithms

Journal of the ACM (JACM)
One-bit counts between unique and sticky

Proceedings of the 1st international symposium on Memory management
BOS is boss: a case for bulk-synchronous object systems

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Selecting tile shape for minimal execution time

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Efficient distributed algorithms to build inverted files

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Optimal Clustering of Tree-Sweep Computations for High-Latency Parallel Environments

IEEE Transactions on Parallel and Distributed Systems
Compression using efficient multicasting

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Optimal schedules for data-parallel cycle-stealing in networks of workstations (extended abstract)

Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Task Allocation on a Network of Processors

IEEE Transactions on Computers
Language support for Morton-order matrices

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Optimal Schedules for Cycle-Stealing in a Network of Workstations with a Bag-of-Tasks Workload

IEEE Transactions on Parallel and Distributed Systems
Optimal and efficient algorithms for summing and prefix summing on parallel machines

Journal of Parallel and Distributed Computing
On scheduling send-graphs and receive-graphs under the LogP-model

Information Processing Letters
Opportunity Cost Algorithms for Reduction of I/O and Interprocess Communication Overhead in a Computing Cluster

IEEE Transactions on Parallel and Distributed Systems
Parallel Bridging Models and Their Impact on Algorithm Design

ICCS '01 Proceedings of the International Conference on Computational Science-Part II
On the Effectiveness of D-BSP as a Bridging Model of Parallel Computation

ICCS '01 Proceedings of the International Conference on Computational Science-Part II
Efficient Parallel Algorithms for 2-Dimensional Ising Spin Models

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Fault-Tolerance for Token-based Synchronization Protocols

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
HiHCoHP: Toward a Realistic Communication Model for Hierarchical HyperClusters of Heterogeneous Processors

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Improving Parallel Job Scheduling Using Runtime Measurements

IPDPS '00/JSSPP '00 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Ahnentafel Indexing into Morton-Ordered Arrays, or Matrix Locality for Free

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
On the Predictive Quality of BSP-like Cost Functions for NOWs

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Scheduling Iterative Programs onto LogP-Machine

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Approximation Algorithms for Scheduling Malleable Tasks under Precedence Constraints

ESA '01 Proceedings of the 9th Annual European Symposium on Algorithms
A Data-Clustering Algorithm on Distributed Memory Multiprocessors

Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
On Stalling in LogP

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
On the Parallel Execution Time of Tiled Loops

IEEE Transactions on Parallel and Distributed Systems
Distributing and scheduling divisible task on parallel communicating processors

Journal of Computer Science and Technology
Optimal sharing of bags of tasks in heterogeneous clusters

Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures
Parallel Complexity of Matrix Multiplication

The Journal of Supercomputing
QR factorization with Morton-ordered quadtree matrices for memory re-use and parallelism

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Asymptotically Optimal Worksharing in HNOWs: How Long is "Sufficiently Long?"

ANSS '03 Proceedings of the 36th annual symposium on Simulation
A Methodology for Designing Efficient On-Chip Interconnects on Well-Behaved Communication Patterns

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Δ-stepping: a parallelizable shortest path algorithm

Journal of Algorithms
Predicting and Evaluating Distributed Communication Performance

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
The Opie compiler from row-major source to Morton-ordered matrices

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
On stalling in LogP

Journal of Parallel and Distributed Computing
Efficient trigger-broadcasting in heterogeneous clusters

Journal of Parallel and Distributed Computing
Scheduling malleable tasks with precedence constraints

Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures
Performance and Scalability Analysis of Teraflop-Scale Parallel Architectures Using Multidimensional Wavefront Applications

International Journal of High Performance Computing Applications
Performance Evaluation of Linear Algebra Routines

International Journal of High Performance Computing Applications
Application Resource Requirement Estimation in a Parallel-Pipeline Model of Execution

IEEE Transactions on Parallel and Distributed Systems
A Design Methodology for Efficient Application-Specific On-Chip Interconnects

IEEE Transactions on Parallel and Distributed Systems
Optimal and efficient parallel tridiagonal solvers using direct methods

The Journal of Supercomputing - Special issue: Parallel and distributed processing and applications
An experimental evaluation of the HP V-class and SGI origin 2000 multiprocessors using microbenchmarks and scientific applications

International Journal of Parallel Programming
An approximation algorithm for scheduling malleable tasks under general precedence constraints

ACM Transactions on Algorithms (TALG)
Scalable, fault tolerant membership for MPI tasks on HPC systems

Proceedings of the 20th annual international conference on Supercomputing
Parallel bisecting k-means with prediction clustering algorithm

The Journal of Supercomputing
$\log_{\rm n}{\rm P}$ and $\log_{3}{\rm P}$: Accurate Analytical Models of Point-to-Point Communication in Distributed Systems

IEEE Transactions on Computers
Synchronous parallel kinetic Monte Carlo for continuum diffusion-reaction systems

Journal of Computational Physics
Foundations for the integration of scheduling techniques into compilers for parallel languages

International Journal of Computational Science and Engineering
Efficient algorithms for parallelizing Monte Carlo simulations for 2D Ising spin models

The Journal of Supercomputing
A regression-based approach to scalability prediction

Proceedings of the 22nd annual international conference on Supercomputing
Optimal broadcast for fully connected processor-node networks

Journal of Parallel and Distributed Computing
A framework for adaptive collective communications for heterogeneous hierarchical computing systems

Journal of Computer and System Sciences
Adaptive approaches for efficient parallel algorithms on cluster-based systems

International Journal of Grid and Utility Computing
Mapping parallelism to multi-cores: a machine learning based approach

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
What the parallel-processing community has (failed) to offer the multi/many-core generation

Journal of Parallel and Distributed Computing
On designing optimal parallel triangular solvers

Information and Computation
Algorithms for memory hierarchies: advanced lectures

Algorithms for memory hierarchies: advanced lectures
Evolutionary Rough Parallel Multi-Objective Optimization Algorithm

Fundamenta Informaticae
Partitioning streaming parallelism for multi-cores: a machine learning based approach

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Fitting FFT onto an energy efficient massively parallel architecture

Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies
Parallel algorithms

Algorithms and theory of computation handbook
A bridging model for multi-core computing

Journal of Computer and System Sciences
Contention-aware scheduling with task duplication

Journal of Parallel and Distributed Computing
Algorithm engineering: bridging the gap between algorithm theory and practice

Algorithm engineering: bridging the gap between algorithm theory and practice
The round complexity of distributed sorting: extended abstract

Proceedings of the 30th annual ACM SIGACT-SIGOPS symposium on Principles of distributed computing
A contention-aware performance model for HPC-based networks: a case study of the InfiniBand network

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
DOT: a matrix model for analyzing, optimizing and deploying software for big data analytics in distributed systems

Proceedings of the 2nd ACM Symposium on Cloud Computing
Scheduling malleable tasks with precedence constraints

Journal of Computer and System Sciences
Total exchange performance modelling under network contention

PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
Multi-DaC programming model: a variant of multi-BSP model for divide-and-conquer algorithms

DAMP '12 Proceedings of the 7th workshop on Declarative aspects and applications of multicore programming
Optimal broadcast for fully connected networks

HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
An approximation algorithm for scheduling malleable tasks under general precedence constraints

ISAAC'05 Proceedings of the 16th international conference on Algorithms and Computation
Prediction of communication latency over complex network behaviors on SMP clusters

EPEW'05/WS-FM'05 Proceedings of the 2005 international conference on European Performance Engineering, and Web Services and Formal Methods, international conference on Formal Techniques for Computer Systems and Business Processes
Compiler-Directed performance model construction for parallel programs

ARCS'10 Proceedings of the 23rd international conference on Architecture of Computing Systems
Performance analysis and optimization of MPI collective operations on multi-core clusters

The Journal of Supercomputing
Multifaceted web services: an approach to secure and scalable grid scheduling

EuroWeb'02 Proceedings of the 2002 international conference on EuroWeb
Self--consistent MPI performance requirements

PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Modeling and predicting performance of high performance computing applications on hardware accelerators

International Journal of High Performance Computing Applications
Performance modeling of hybrid MPI/OpenMP scientific applications on large-scale multicore supercomputers

Journal of Computer and System Sciences
Using machine learning to partition streaming programs

ACM Transactions on Architecture and Code Optimization (TACO)
Making queries tractable on big data with preprocessing: through the eyes of complexity theory

Proceedings of the VLDB Endowment
Ingredients of adaptability: a survey of reconfigurable processors

VLSI Design
Measurement of the latency parameters of the Multi-BSP model: a multicore benchmarking approach

The Journal of Supercomputing

Quantified Score

Hi-index	48.24

LogP: a practical model of parallel computation

Quantified Score

Visualization

Abstract