Type architectures, shared memory, and the corollary of modest potential
Annual review of computer science vol. 1, 1986
Towards an architecture-independent analysis of parallel algorithms
STOC '88 Proceedings of the twentieth annual ACM symposium on Theory of computing
On communication latency in PRAM computations
SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
The APRAM: incorporating asynchrony into the PRAM model
SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
Efficient parallel algorithms can be made robust
Proceedings of the eighth annual ACM Symposium on Principles of distributed computing
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Communication complexity of PRAMs
Theoretical Computer Science - Special issue: Fifteenth international colloquium on automata, languages and programming, Tampere, Finland, July 1988
A bridging model for parallel computation
Communications of the ACM
Performance Analysis of k-ary n-cube Interconnection Networks
IEEE Transactions on Computers
Efficient robust parallel computations
STOC '90 Proceedings of the twenty-second annual ACM symposium on Theory of computing
Efficient PRAM simulation on a distributed memory machine
STOC '92 Proceedings of the twenty-fourth annual ACM symposium on Theory of computing
The Stanford Dash Multiprocessor
Computer
Ultracomputers: a teraflop before its time
Communications of the ACM
Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Designing broadcasting algorithms in the postal model for message-passing systems
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Monsoon: an explicit token-store architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Parallelism in random access machines
STOC '78 Proceedings of the tenth annual ACM symposium on Theory of computing
LogP: Towards a Realistic Model of Parallel Computation
LogP: Towards a Realistic Model of Parallel Computation
Optimal Broadcast and Summation in the LogP Model
Optimal Broadcast and Summation in the LogP Model
Hiding Communication Costs in Bandwidth-Limited Parallel FFT
Hiding Communication Costs in Bandwidth-Limited Parallel FFT
Parallel algorithms column 1: models of computation
ACM SIGACT News
Optimal broadcast and summation in the LogP model
SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Contention in shared memory algorithms
STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
Modeling communication in parallel algorithms: a fruitful interaction between theory and systems?
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
An architecture for optimal all-to-all personalized communication
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Efficient algorithms for all-to-all communications in multi-port message-passing systems
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Maya: a simulation platform for distributed shared memories
PADS '94 Proceedings of the eighth workshop on Parallel and distributed simulation
An approach to scalability study of shared memory parallel systems
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Trade-offs between communication throughput and parallel time
STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
Where is time spent in message-passing and shared-memory programs?
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
A performance evaluation of lock-free synchronization protocols
PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers
IEEE Transactions on Parallel and Distributed Systems
High-level optimization via automated statistical modeling
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Performance predictions for parallel diagonal-implicitly iterated Runge-Kutta methods
PADS '95 Proceedings of the ninth workshop on Parallel and distributed simulation
A randomized parallel 3D convex hull algorithm for coarse grained multicomputers
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Accounting for memory bank contention and delay in high-bandwidth multiprocessors
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Parallel sorting with limited bandwidth
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Universal congestion control for meshes
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
The interaction of parallel and sequential workloads on a network of workstations
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Parallelism in sequential functional languages
FPCA '95 Proceedings of the seventh international conference on Functional programming languages and computer architecture
Upper time bounds for executing PRAM-programs on the LogP-machine
ICS '95 Proceedings of the 9th international conference on Supercomputing
On the Design and Implementation of Broadcast and Global Combine Operations Using the Postal Model
IEEE Transactions on Parallel and Distributed Systems
Practical parallel algorithms for personalized communication and integer sorting
Journal of Experimental Algorithmics (JEA)
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Towards efficiency and portability: programming with the BSP model
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Asynchronous shared memory search structures
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
On multiprocessor system scheduling
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Synchronization hardware for networks of workstations: performance vs. cost
ICS '96 Proceedings of the 10th international conference on Supercomputing
Communication-efficient parallel sorting (preliminary version)
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
LogP: a practical model of parallel computation
Communications of the ACM
Fast Parallel Sorting Under LogP: Experience with the CM-5
IEEE Transactions on Parallel and Distributed Systems
The Block Distributed Memory Model
IEEE Transactions on Parallel and Distributed Systems
A quantitative comparison of parallel computation models
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
A bridging model for parallel computation, communication, and I/O
ACM Computing Surveys (CSUR) - Special issue: position statements on strategic directions in computing research
Can shared-memory model serve as a bridging model for parallel computation?
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Modeling parallel bandwidth: local vs. global restrictions
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Triplex: a multi-class routing algorithm
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Fine-grain multithreading with the EM-X multiprocessor
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Performance considerations in software multicasts
ICS '97 Proceedings of the 11th international conference on Supercomputing
Performance implications of communication mechanisms in all-software global address space systems
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
LoPC: modeling contention in parallel algorithms
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Effects of communication latency, overhead, and bandwidth in a cluster architecture
Proceedings of the 24th annual international symposium on Computer architecture
Accounting for Memory Bank Contention and Delay in High-Bandwidth Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Efficient Algorithms for the Reduce-Scatter Operation in LogGP
IEEE Transactions on Parallel and Distributed Systems
Exploiting local data in parallel array I/O on a practical network of workstations
Proceedings of the fifth workshop on I/O in parallel and distributed systems
Maintaining dynamic geometric objects on parallel processors
PRS '97 Proceedings of the IEEE symposium on Parallel rendering
Generating an Efficient Broadcast Sequence Using Reflected Gray Codes
IEEE Transactions on Parallel and Distributed Systems
Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems
IEEE Transactions on Parallel and Distributed Systems
Abstractions for Portable, Scalable Parallel Programming
IEEE Transactions on Parallel and Distributed Systems
Communication-optimal parallel minimum spanning tree algorithms (extended abstract)
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Computational bounds for fundamental problems on general-purpose parallel models
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Load balanced parallel radix sort
ICS '98 Proceedings of the 12th international conference on Supercomputing
Scheduling with implicit information in distributed systems
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
LoGPC: modeling network contention in message-passing programs
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Support for Efficient Programming on the SB-PRAM
International Journal of Parallel Programming
Models and languages for parallel computation
ACM Computing Surveys (CSUR)
A quantitative comparison of parallel computation models
ACM Transactions on Computer Systems (TOCS)
A new deterministic parallel sorting algorithm with an experimental evaluation
Journal of Experimental Algorithmics (JEA)
A Gracefully Degrading Massively Parallel System Using the BSP Model, and Its Evaluation
IEEE Transactions on Computers
An Application-Driven Study of Parallel System Overheads and Network Bandwidth Requirements
IEEE Transactions on Parallel and Distributed Systems
Scaling application performance on a cache-coherent multiprocessor
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
MagPIe: MPI's collective communication operations for clustered wide area systems
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Predictive analysis of a wavefront application using LogGP
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
NFS sensitivity to high performance networks
SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Resource Scaling Effects on MPP Performance: The STAP Benchmark Implications
IEEE Transactions on Parallel and Distributed Systems
Problem space promotion and its evaluation as a technique for efficient parallel computation
ICS '99 Proceedings of the 13th international conference on Supercomputing
BOS is boss: a case for bulk-synchronous object systems
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Preemptive scheduling of parallel jobs on multiprocessors
Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
Randomized fully-scalable BSP techniques for multi-searching and convex hull construction
SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
The QRQW PRAM: accounting for contention in parallel algorithms
SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Emulations between QSM, BSP, and LogP: a framework for general-purpose parallel algorithm design
Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
On the partitionability of hierarchical radiosity
PVGS '99 Proceedings of the 1999 IEEE symposium on Parallel visualization and graphics
Fault-Tolerant Communication Algorithms in Toroidal Networks
IEEE Transactions on Parallel and Distributed Systems
Portable and Efficient Parallel Computing Using the BSP Model
IEEE Transactions on Computers
Coarse grained parallel computing on heterogeneous systems
SAC '98 Proceedings of the 1998 ACM symposium on Applied Computing
Improved parallel and sequential walking tree methods for biological string alignments
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Toward Formally-Based Design of Message Passing Programs
IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools for parallel processing
A Design Methodology for Data-Parallel Applications
IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools parallel processing
A Transformation Approach to Derive Efficient Parallel Implementations
IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools parallel processing
Scatter and gather operations on an asynchronous communication model
SAC '00 Proceedings of the 2000 ACM symposium on Applied computing - Volume 2
A formal model for the parallel semantics of P3L
SAC '00 Proceedings of the 2000 ACM symposium on Applied computing - Volume 2
The Journal of Supercomputing
Profiling a parallel language based on fine-grained communication
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Automatically tuned collective communications
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
H-BSP: A Hierarchical BSP Computation Model
The Journal of Supercomputing
Parallelizing the Murϕ Verifier
Formal Methods in System Design - Special issue on CAV '97
LoGPC: Modeling Network Contention in Message-Passing Programs
IEEE Transactions on Parallel and Distributed Systems
Towards practical deteministic write-all algorithms
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
LogGPS: a parallel computational model for synchronization analysis
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Tolerating communication latency through dynamic thread invocation in a multithreaded architecture
Compiler optimizations for scalable parallel systems
Implicit coscheduling: coordinated scheduling with implicit information in distributed systems
ACM Transactions on Computer Systems (TOCS)
SimpleFit: A Framework for Analyzing Design Trade-Offs in Raw Architectures
IEEE Transactions on Parallel and Distributed Systems
An implementation and analysis of the virtual interface architecture
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Communication overlap in multi-tier parallel algorithms
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Multi-protocol active messages on a cluster of SMP's
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Performance prediction for random write reductions: a case study in modeling shared memory programs
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Near-optimal adaptive control of a large grid application
ICS '02 Proceedings of the 16th international conference on Supercomputing
IEEE Transactions on Parallel and Distributed Systems
Evaluating the running time of a communication round over the internet
Proceedings of the twenty-first annual symposium on Principles of distributed computing
Use of a CORBA/RMI gateway: characterization of communication overhead
WOSP '02 Proceedings of the 3rd international workshop on Software and performance
On scheduling send-graphs and receive-graphs under the LogP-model
Information Processing Letters
Parallel FFT on ATM-based networks of workstations
Cluster Computing
Architectural differences of efficient sequential and parallel computers
Journal of Systems Architecture: the EUROMICRO Journal
An Application-Driven Study of Multicast Communication for Write Invalidation
The Journal of Supercomputing
Adaptive parallel computing on heterogeneous networks with mpC
Parallel Computing
Performance-steered design of software architectures for embedded multicore systems
Software—Practice & Experience
A software architecture for user transparent parallel image processing
Parallel Computing - Parallel computing in image and video processing
Modeling Communication Overhead: MPI and MPL Performance on the IBM SP2
IEEE Parallel & Distributed Technology: Systems & Technology
A Case for NOW (Networks of Workstations)
IEEE Micro
Assessing Fast Network Interfaces
IEEE Micro
Computing Global Combine Operations in the Multiport Postal Model
IEEE Transactions on Parallel and Distributed Systems
POEMS: End-to-End Performance Design of Large Parallel Adaptive Computational Systems
IEEE Transactions on Software Engineering
Symbolic Performance Modeling of Parallel Systems
IEEE Transactions on Parallel and Distributed Systems
Improving the Throughput of Remote Storage Access through Pipelining
GRID '02 Proceedings of the Third International Workshop on Grid Computing
Towards an Accurate Model for Collective Communications
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Fault Tolerant MPI for the HARNESS Meta-computing System
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Parallel Models and Job Characterization for System Scheduling
ICCS '01 Proceedings of the International Conference on Computational Science-Part II
Parallel Algorithm Design with Coarse-Grained Synchronization
ICCS '01 Proceedings of the International Conference on Computational Science-Part II
An Analytical Model for a Class of Architectures under Master-Slave Paradigm
HPCN Europe 2000 Proceedings of the 8th International Conference on High-Performance Computing and Networking
Skel-BSP: Performance Portability for Skeletal Programming
HPCN Europe 2000 Proceedings of the 8th International Conference on High-Performance Computing and Networking
Source Code and Task Graphs in Program Optimization
HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
ECO: Efficient Collective Operations for Communication on Heterogeneous Networks
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Parallel Implementation of Borvka's Minimum Spanning Tree Algorithm
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
A Randomized Algorithm for Voronoi Diagram of Line Segments on Coarse-Grained Multiprocessors
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Parallel 'Go with the Winners' Algorithms in the LogP Model
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
A 2-D Parallel Convex Hull Algorithm with Optimal Communication Phases
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Optimizing Parallel Bitonic Sort
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Architecture-Dependent Tuning of the Parameterized Communication Model for Optimal Multicasting
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
A Parallel LCC Simulation System
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Exploiting Hierarchy in Heterogeneous Environments
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Predicting Scalability of Parallel Garbage Collectors on Shared Memory Multiprocessors
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
On the Design of Clustering-based Scheduling Algorithms for Realistic Machine Models
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Fast Measurement of LogP Parameters for Message Passing Platforms
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
A Hardware Implementation of PRAM and Its Performance Evaluation
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
VIBe: A Micro-benchmark Suite for Evaluating Virtual Interface Architecture (VIA) Implementations
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Cost Hierarchies for Abstract Parallel Machines
LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
kappa NUMA: A Model for Clusters of SMP-Machines
PPAM '01 Proceedings of the th International Conference on Parallel Processing and Applied Mathematics-Revised Papers
SWAT '02 Proceedings of the 8th Scandinavian Workshop on Algorithm Theory
RP*: A Family of Order Preserving Scalable Distributed Data Structures
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Implementation and Analysis of a Parallel Collection Query Language
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Asynchronous Random Polling Dynamic Load Balancing
ISAAC '99 Proceedings of the 10th International Symposium on Algorithms and Computation
Improving Parallel Job Scheduling Using Runtime Measurements
IPDPS '00/JSSPP '00 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
On Minimising the Processor Requirements of LogP Schedules
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
VizzScheduler - A Framework for the Visualization of Scheduling Algorithms
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Symbolic Cost Estimation of Parallel Applications
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Optimal Scheduling Algorithms for Communication Constrained Parallel Processing
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
On Scheduling Task-Graphs to LogP-Machines with Disturbances
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Scheduling Arbitrary Task Graphs on LogP Machines
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Scheduling Iterative Programs onto LogP-Machine
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
VECPAR '00 Selected Papers and Invited Talks from the 4th International Conference on Vector and Parallel Processing
Graph Coloring on a Coarse Grained Multiprocessor
WG '00 Proceedings of the 26th International Workshop on Graph-Theoretic Concepts in Computer Science
Handling Graphs According to a Coarse Grained Approach: Experiments with PVM and MPI
Proceedings of the 7th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Validation of Dimemas Communication Model for MPI Collective Operations
Proceedings of the 7th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
ACCT: Automatic Collective Communications Tuning
Proceedings of the 7th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Adaptive Execution of Pipelines
Proceedings of the 8th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Architecture Independent Analysis of Parallel Programs
ICCS '01 Proceedings of the International Conference on Computational Science-Part II
A Customizable Simulator for Workstation Networks
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
A Cost Model for Asynchronous and Structured Message Passing
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Towards the automatic optimal mapping of pipeline algorithms
Parallel Computing
HARNESS fault tolerant MPI design, usage and performance issues
Future Generation Computer Systems - Grid computing: Towards a new computing infrastructure
EPPP - an integrated environment for portable parallel programming
CASCON '94 Proceedings of the 1994 conference of the Centre for Advanced Studies on Collaborative research
Automatic decomposition in EPPP compiler
CASCON '94 Proceedings of the 1994 conference of the Centre for Advanced Studies on Collaborative research
High performance RDMA-based MPI implementation over InfiniBand
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
A3: a simple and asymptotically accurate model for parallel computation
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Abstracting network characteristics and locality properties of parallel systems
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Evaluation of Various Node Configurations for Fine-grain Multithreading on Stock Processors
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
On finding optimal clusterings of task graphs
PAS '95 Proceedings of the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis
Algorithm engineering for parallel computation
Experimental algorithmics
MPICH-G2: a Grid-enabled implementation of the Message Passing Interface
Journal of Parallel and Distributed Computing - Special issue on computational grids
Experimental Validation of Parallel Computation Models on the Intel Paragon
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Predicting the Running Times of Parallel Programs by Simulation
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Managing Concurrent Access for Shared Memory Active Messages
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Latency, Occupancy, and Bandwidth in DSM Multiprocessors: A Performance Evaluation
IEEE Transactions on Computers
Contention-Aware Communication Schedule for High-Speed Communication
Cluster Computing
On the elusive benefits of protocol offload
NICELI '03 Proceedings of the ACM SIGCOMM workshop on Network-I/O convergence: experience, lessons, implications
Graph coloring on coarse grained multicomputers
Discrete Applied Mathematics - Special issue: The second international colloquium, "journées de l'informatique messine"
Parallel 'go with the winners' algorithms in distributed memory models
Journal of Parallel and Distributed Computing - Special section best papers from the 2002 international parallel and distributed processing symposium
Efficient implementation of reduce-scatter in MPI
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Parallel, distributed and network-based processing
Incorporating memory layout in the modeling of message passing programs
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Parallel, distributed and network-based processing
Parallel Computing - Special issue: High performance computing with geographical data
Optimal broadcast on parallel locality models
Journal of Discrete Algorithms
Emulations between QSM, BSP and LogP: a framework for general-purpose parallel algorithm design
Journal of Parallel and Distributed Computing
Parallel program performance prediction using deterministic task graph analysis
ACM Transactions on Computer Systems (TOCS)
Performance modelling for task-parallel programs
Performance analysis and grid computing
Quantification of memory communication
High performance scientific and engineering computing
Opportunities and challenges in application-tuned circuits and architectures based on nanodevices
Proceedings of the 1st conference on Computing frontiers
Dynamic TCP acknowledgment in the LogP model
Journal of Algorithms
A tool for the design and evaluation of hybrid scheduling algorithms for computational grids
MGC '04 Proceedings of the 2nd workshop on Middleware for grid computing
Predicting the Performance of Synchronous Discrete Event Simulation
IEEE Transactions on Parallel and Distributed Systems
Configuration of distributed message converter systems
Performance Evaluation
Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Scalable NIC-based Reduction on Large-scale Clusters
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Coarse grained gather and scatter operations with applications
Journal of Parallel and Distributed Computing
On performance analysis of heterogeneous parallel algorithms
Parallel Computing
On searching strategies, parallel questions, and delayed answers
Discrete Applied Mathematics - Fun with algorithms 2 (FUN 2001)
A Hardware Acceleration Unit for MPI Queue Processing
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Enhancing NIC Performance for MPI using Processing-in-Memory
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 10
System Support to Balance the Resource Supply and Demand in High-end Computing
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
A Method for MPI Broadcast in Computational Grids
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 13 - Volume 14
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 14 - Volume 15
Performance Analysis of MPI Collective Operations
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 15 - Volume 16
Pace--A Toolset for the Performance Prediction of Parallel and Distributed Systems
International Journal of High Performance Computing Applications
A tight analysis and near-optimal instances of the algorithm of Anderson and Woll
Theoretical Computer Science
A Coarse-Grained multicomputer algorithm for the detection of repetitions
Information Processing Letters
Communication Contention in Task Scheduling
IEEE Transactions on Parallel and Distributed Systems
Broadcasting on networks of workstations
Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures
The design and implementation of LilyTask in shared memory
ACM SIGOPS Operating Systems Review
A methodology for detailed performance modeling of reduction computations on SMP machines
Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
Towards an Accurate Model for Collective Communications
International Journal of High Performance Computing Applications
OS support for a commodity database on PC clusters: distributed devices vs. distributed file systems
ADC '05 Proceedings of the 16th Australasian database conference - Volume 39
Making the Most Out of Direct-Access Network Attached Storage
FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
The MHETA Execution Model for Heterogeneous Clusters
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Cross-Platform Performance Prediction of Parallel Applications Using Partial Execution
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Performance Modeling and Tuning Strategies of Mixed Mode Collective Communications
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Toward a Realistic Task Scheduling Model
IEEE Transactions on Parallel and Distributed Systems
High performance RDMA-based MPI implementation over infiniBand
International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
MOLAR: adaptive runtime support for high-end computing operating and runtime systems
ACM SIGOPS Operating Systems Review
A detailed MPI communication model for distributed systems
Future Generation Computer Systems
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Parallel, distributed and network-based processing
Self-adapting numerical software (SANS) effort
IBM Journal of Research and Development
PEMPIs: a new methodology of modeling and prediction of MPI programs performance
International Journal of Parallel Programming
Modeling instruction placement on a spatial architecture
Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
The cache complexity of multithreaded cache oblivious algorithms
Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
A Parallel Independent Component Analysis Algorithm
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Performance Modeling of Communication and Computation in Hybrid MPI and OpenMP Applications
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 2
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 2
A 5/4-approximation algorithm for scheduling identical malleable tasks
Theoretical Computer Science - Approximation and online algorithms
Scalable, fault tolerant membership for MPI tasks on HPC systems
Proceedings of the 20th annual international conference on Supercomputing
Proceedings of the 20th annual international conference on Supercomputing
A Parallel Computational Model for Heterogeneous Clusters
IEEE Transactions on Parallel and Distributed Systems
Performance modeling and optimization of a high energy colliding beam simulation code
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Parallel sparse LU factorization on different message passing platforms
Journal of Parallel and Distributed Computing
Parallel Computing - Optimization on grids - Optimization for grids
Optimizing communication overlap for high-speed networks
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Asynchronous resource discovery in peer-to-peer networks
Computer Networks: The International Journal of Computer and Telecommunications Networking
The design and development of ZPL
Proceedings of the third ACM SIGPLAN conference on History of programming languages
Performance engineering, PSEs and the GRID
Scientific Programming
Performance analysis of MPI collective operations
Cluster Computing
An efficient MPI_allgather for grids
Proceedings of the 16th international symposium on High performance distributed computing
PRO: a model for the design and analysis of efficient and scalable parallel algorithms
Nordic Journal of Computing
RAT: a methodology for predicting performance in application design migration to FPGAs
HPRCTA '07 Proceedings of the 1st international workshop on High-performance reconfigurable computing technology and applications: held in conjunction with SC07
An Evaluation of Marenostrum Performance
International Journal of High Performance Computing Applications
Performance without pain = productivity: data layout and collective communication in UPC
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
NIC-based reduction algorithms for large-scale clusters
International Journal of High Performance Computing and Networking
Performance evaluation of the Sun Fire Link SMP clusters
International Journal of High Performance Computing and Networking
HPM: a hierarchical model for parallel computations
International Journal of High Performance Computing and Networking
Foundations for the integration of scheduling techniques into compilers for parallel languages
International Journal of Computational Science and Engineering
Distributed computing and the multicore revolution
ACM SIGACT News
Implementation and performance analysis of non-blocking collective operations for MPI
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Fpga-based prototype of a pram-on-chip processor
Proceedings of the 5th conference on Computing frontiers
Techniques for pipelined broadcast on ethernet switched clusters
Journal of Parallel and Distributed Computing
Performance portable optimizations for loops containing communication operations
Proceedings of the 22nd annual international conference on Supercomputing
Optimal speedup on a low-degree multi-core parallel architecture (LoPRAM)
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Fundamental parallel algorithms for private-cache chip multiprocessors
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Cache-efficient dynamic programming algorithms for multicores
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Overcoming the processor communication overhead in MPI applications
SpringSim '07 Proceedings of the 2007 spring simulation multiconference - Volume 2
A pilot study to compare programming effort for two parallel programming models
Journal of Systems and Software
A Software Tool for Accurate Estimation of Parameters of Heterogeneous Communication Models
Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Automatic Communication Performance Debugging in PGAS Languages
Languages and Compilers for Parallel Computing
RAT: RC Amenability Test for Rapid Performance Prediction
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
A unified model for multicore architectures
IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
On parallel image retrieval with dynamically extracted features
Parallel Computing
On modeling and evaluating multicomputer transcoding architectures for live-video streams
Multimedia Tools and Applications
Boolean circuit programming: A new paradigm to design parallel algorithms
Journal of Discrete Algorithms
Global Principal Typing in Partially Commutative Asynchronous Sessions
ESOP '09 Proceedings of the 18th European Symposium on Programming Languages and Systems: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
WARPP: a toolkit for simulating high-performance parallel scientific codes
Proceedings of the 2nd International Conference on Simulation Tools and Techniques
Accurate and Efficient Estimation of Parameters of Heterogeneous Communication Performance Models
International Journal of High Performance Computing Applications
Extracting and predicting the communication behaviour of parallel applications
International Journal of Parallel, Emergent and Distributed Systems
A domain-specific approach for software development on Manycore platforms
ACM SIGARCH Computer Architecture News
Type-Directed Compilation for Multicore Programming
Electronic Notes in Theoretical Computer Science (ENTCS)
Developing parallel programs: A design-oriented perspective
IWMSE '09 Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering
What the parallel-processing community has (failed) to offer the multi/many-core generation
Journal of Parallel and Distributed Computing
Evaluating multicore algorithms on the unified memory model
Scientific Programming - Software Development for Multi-core Computing Systems
Modeling advanced collective communication algorithms on cell-based systems
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Optimum Broadcasting in Complete Weighted-Vertex Graphs
SOFSEM '10 Proceedings of the 36th Conference on Current Trends in Theory and Practice of Computer Science
Graph coloring on coarse grained multicomputers
Discrete Applied Mathematics
An Adaptive High-Performance Service Architecture
Electronic Notes in Theoretical Computer Science (ENTCS)
A detailed MPI communication model for distributed systems
Future Generation Computer Systems
A Coarse-Grained Multicomputer algorithm for the detection of repetitions
Information Processing Letters
An efficient collective communication method for grid scale networks
ICCS'03 Proceedings of the 2003 international conference on Computational science
Model for simulation of heterogeneous high-performance computing environments
VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
ICCSA'03 Proceedings of the 2003 international conference on Computational science and its applications: PartII
Modeling multigrain parallelism on heterogeneous multi-core processors: a case study of the cell BE
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Low depth cache-oblivious algorithms
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Low-contention data structures
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Manycore performance analysis using timed configuration graphs
SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
Throughput optimal total order broadcast for cluster environments
ACM Transactions on Computer Systems (TOCS)
mPlogP: A Parallel Computation Model for Heterogeneous Multi-core Computer
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
LogGOPSim: simulating large-scale applications in the LogGOPS model
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
The LogP and MLogP models for parallel image processing with multi-core microprocessor
Proceedings of the 2010 Symposium on Information and Communication Technology
Optimizing collective communication on multicores
HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Design principles for end-to-end multicore schedulers
HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Estimating parallel performance, a skeleton-based approach
Proceedings of the fourth international workshop on High-level parallel programming and applications
A model of computation for MapReduce
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Revisiting Cramer's rule for solving dense linear systems
SpringSim '10 Proceedings of the 2010 Spring Simulation Multiconference
Parallel computation: models and complexity issues
Algorithms and theory of computation handbook
Incorporating memory layout in the modeling of message passing programs
EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
Efficient implementation of reduce-scatter in MPI
EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
Predictability of bulk synchronous programs using MPI
EURO-PDP'00 Proceedings of the 8th Euromicro conference on Parallel and distributed processing
Towards efficient BSP implementations of BSR programs for some computational geometry problems
EURO-PDP'00 Proceedings of the 8th Euromicro conference on Parallel and distributed processing
Fast barrier synchronization for InfiniBand™
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
LogfP - a model for small messages in InfiniBand
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A performance model for fine-grain accesses in UPC
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
An economy-driven mapping heuristic for hierarchical master-slave applications in grid systems
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Reining in the outliers in map-reduce clusters using Mantri
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Performance comparison of some shared memory organizations for 2D mesh-like NOCs
Microprocessors & Microsystems
Making the most out of direct-access network attached storage
FAST'03 Proceedings of the 2nd USENIX conference on File and storage technologies
Reliable performance prediction for multigrid software on distributed memory systems
Advances in Engineering Software
Parallel evaluation of conjunctive queries
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Scheduling irregular parallel computations on hierarchical caches
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
An idiom-finding tool for increasing productivity of accelerators
Proceedings of the international conference on Supercomputing
An analytical model for multilevel performance prediction of Multi-FPGA systems
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Balance principles for algorithm-architecture co-design
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Performance modeling for multilevel communication in SHMEM+
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
A high performance superpipeline protocol for infiniband
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Cache size in a cost model for heterogeneous skeletons
Proceedings of the fifth international workshop on High-level parallel programming and applications
A framework for an automatic hybrid MPI+OpenMP code generation
Proceedings of the 19th High Performance Computing Symposia
Performance modeling for systematic performance tuning
State of the Practice Reports
Improving communication performance in dense linear algebra via topology aware collectives
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Collective communication costs analysis over gigabit ethernet and infiniband
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
A case for non-blocking collective operations
ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
Optimizing explicit data transfers for data parallel applications on the cell architecture
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Challenges and issues in benchmarking MPI
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
An efficient collective communication method using a shortest path algorithm in a computational grid
GCC'05 Proceedings of the 4th international conference on Grid and Cooperative Computing
GCC'05 Proceedings of the 4th international conference on Grid and Cooperative Computing
A network evaluation for LAN, MAN and WAN grid environments
EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
A CGM algorithm solving the longest increasing subsequence problem
ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part V
PerWiz: a what-if prediction tool for tuning message passing programs
VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
Message strip-mining heuristics for high speed networks
VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
Adapting distributed scientific applications to run-time network conditions
PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Improving multilevel approach for optimizing collective communications in computational grids
EGC'05 Proceedings of the 2005 European conference on Advances in Grid Computing
Modeling execution time of selected computation and communication kernels on grids
EGC'05 Proceedings of the 2005 European conference on Advances in Grid Computing
Reconfigurable scientific applications on GRID services
EGC'05 Proceedings of the 2005 European conference on Advances in Grid Computing
A two-phase scheduling algorithm for efficient collective communications of MPICH-G2
ICDCIT'05 Proceedings of the Second international conference on Distributed Computing and Internet Technology
Low-contention data structures
Journal of Parallel and Distributed Computing
SGL: towards a bridging model for heterogeneous hierarchical platforms
International Journal of High Performance Computing and Networking
Mathematical and Computer Modelling: An International Journal
Performance Modeling and Comparative Analysis of the MILC Lattice QCD Application su3_rmd
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Scalable Multi-purpose Network Representation for Large Scale Distributed System Simulation
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
High-performance RMA-based broadcast on the intel SCC
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Aspen: a domain specific language for performance modeling
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Communication avoiding and overlapping for numerical linear algebra
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A scheduling toolkit for multiprocessor-task programming with dependencies
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
A new memory slowdown model for the characterization of computing systems
PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
Techniques for designing efficient parallel graph algorithms for SMPs and multicore processors
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Optimization of collective communications in HeteroMPI
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Netgauge: a network performance measurement framework
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Towards a complexity model for design and analysis of PGAS-based algorithms
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
High performance checksum computation for fault-tolerant MPI over infiniband
EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Performance modelling of magnetohydrodynamics codes
EPEW'12 Proceedings of the 9th European conference on Computer Performance Engineering
Performance modelling of magnetohydrodynamics codes
EPEW'12 Proceedings of the 9th European conference on Computer Performance Engineering
Modeling communication in cache-coherent SMP systems: a case-study with Xeon Phi
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Estimating parallel performance
Journal of Parallel and Distributed Computing
A survey of pipelined workflow scheduling: Models and algorithms
ACM Computing Surveys (CSUR)
Modeling synthetic aperture radar computation with Aspen
International Journal of High Performance Computing Applications
On the validity of flow-level tcp network models for grid and cloud simulations
ACM Transactions on Modeling and Computer Simulation (TOMACS)
A memory access model for highly-threaded many-core architectures
Future Generation Computer Systems
The Servet 3.0 benchmark suite: Characterization of network performance degradation
Computers and Electrical Engineering
Hi-index | 0.03 |
A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding development of techniques that yield performance across a range of current and future parallel machines. This paper offers a new parallel machine model, called LogP, that reflects the critical technology trends underlying parallel computers. it is intended to serve as a basis for developing fast, portable parallel algorithms and to offer guidelines to machine designers. Such a model must strike a balance between detail and simplicity in order to reveal important bottlenecks without making analysis of interesting problems intractable. The model is based on four parameters that specify abstractly the computing bandwidth, the communication bandwidth, the communication delay, and the efficiency of coupling communication and computation. Portable parallel algorithms typically adapt to the machine configuration, in terms of these parameters. The utility of the model is demonstrated through examples that are implemented on the CM-5.