Type architectures, shared memory, and the corollary of modest potential
Annual review of computer science vol. 1, 1986
Towards an architecture-independent analysis of parallel algorithms
STOC '88 Proceedings of the twentieth annual ACM symposium on Theory of computing
SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
Communication complexity of PRAMs
Theoretical Computer Science - Special issue: Fifteenth international colloquium on automata, languages and programming, Tampere, Finland, July 1988
A bridging model for parallel computation
Communications of the ACM
A comparison of sorting algorithms for the connection machine CM-2
SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
Introduction to parallel algorithms and architectures: array, trees, hypercubes
Introduction to parallel algorithms and architectures: array, trees, hypercubes
Efficient PRAM simulation on a distributed memory machine
STOC '92 Proceedings of the twenty-fourth annual ACM symposium on Theory of computing
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimal broadcast and summation in the LogP model
SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Fast Parallel Sorting Under LogP: Experience with the CM-5
IEEE Transactions on Parallel and Distributed Systems
Assessing Fast Network Interfaces
IEEE Micro
Parallelism in random access machines
STOC '78 Proceedings of the tenth annual ACM symposium on Theory of computing
Solving triangular linear systems in parallel using substitution
SPDP '95 Proceedings of the 7th IEEE Symposium on Parallel and Distributeed Processing
Parallel computation still not ready for the mainstream
Communications of the ACM
From algorithm parallelism to instruction-level parallelism: an encode-decode chain using prefix-sum
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Contention in shared memory algorithms
Journal of the ACM (JACM)
One-bit counts between unique and sticky
Proceedings of the 1st international symposium on Memory management
BOS is boss: a case for bulk-synchronous object systems
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Selecting tile shape for minimal execution time
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Efficient distributed algorithms to build inverted files
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Optimal Clustering of Tree-Sweep Computations for High-Latency Parallel Environments
IEEE Transactions on Parallel and Distributed Systems
Compression using efficient multicasting
STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Optimal schedules for data-parallel cycle-stealing in networks of workstations (extended abstract)
Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Task Allocation on a Network of Processors
IEEE Transactions on Computers
Language support for Morton-order matrices
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Optimal Schedules for Cycle-Stealing in a Network of Workstations with a Bag-of-Tasks Workload
IEEE Transactions on Parallel and Distributed Systems
Optimal and efficient algorithms for summing and prefix summing on parallel machines
Journal of Parallel and Distributed Computing
On scheduling send-graphs and receive-graphs under the LogP-model
Information Processing Letters
IEEE Transactions on Parallel and Distributed Systems
Parallel Bridging Models and Their Impact on Algorithm Design
ICCS '01 Proceedings of the International Conference on Computational Science-Part II
On the Effectiveness of D-BSP as a Bridging Model of Parallel Computation
ICCS '01 Proceedings of the International Conference on Computational Science-Part II
Efficient Parallel Algorithms for 2-Dimensional Ising Spin Models
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Fault-Tolerance for Token-based Synchronization Protocols
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Improving Parallel Job Scheduling Using Runtime Measurements
IPDPS '00/JSSPP '00 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Ahnentafel Indexing into Morton-Ordered Arrays, or Matrix Locality for Free
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
On the Predictive Quality of BSP-like Cost Functions for NOWs
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Scheduling Iterative Programs onto LogP-Machine
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Approximation Algorithms for Scheduling Malleable Tasks under Precedence Constraints
ESA '01 Proceedings of the 9th Annual European Symposium on Algorithms
A Data-Clustering Algorithm on Distributed Memory Multiprocessors
Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
On the Parallel Execution Time of Tiled Loops
IEEE Transactions on Parallel and Distributed Systems
Distributing and scheduling divisible task on parallel communicating processors
Journal of Computer Science and Technology
Optimal sharing of bags of tasks in heterogeneous clusters
Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures
Parallel Complexity of Matrix Multiplication
The Journal of Supercomputing
QR factorization with Morton-ordered quadtree matrices for memory re-use and parallelism
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Asymptotically Optimal Worksharing in HNOWs: How Long is "Sufficiently Long?"
ANSS '03 Proceedings of the 36th annual symposium on Simulation
A Methodology for Designing Efficient On-Chip Interconnects on Well-Behaved Communication Patterns
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Δ-stepping: a parallelizable shortest path algorithm
Journal of Algorithms
Predicting and Evaluating Distributed Communication Performance
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
The Opie compiler from row-major source to Morton-ordered matrices
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Journal of Parallel and Distributed Computing
Efficient trigger-broadcasting in heterogeneous clusters
Journal of Parallel and Distributed Computing
Scheduling malleable tasks with precedence constraints
Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures
International Journal of High Performance Computing Applications
Performance Evaluation of Linear Algebra Routines
International Journal of High Performance Computing Applications
Application Resource Requirement Estimation in a Parallel-Pipeline Model of Execution
IEEE Transactions on Parallel and Distributed Systems
A Design Methodology for Efficient Application-Specific On-Chip Interconnects
IEEE Transactions on Parallel and Distributed Systems
Optimal and efficient parallel tridiagonal solvers using direct methods
The Journal of Supercomputing - Special issue: Parallel and distributed processing and applications
International Journal of Parallel Programming
An approximation algorithm for scheduling malleable tasks under general precedence constraints
ACM Transactions on Algorithms (TALG)
Scalable, fault tolerant membership for MPI tasks on HPC systems
Proceedings of the 20th annual international conference on Supercomputing
Parallel bisecting k-means with prediction clustering algorithm
The Journal of Supercomputing
Synchronous parallel kinetic Monte Carlo for continuum diffusion-reaction systems
Journal of Computational Physics
Foundations for the integration of scheduling techniques into compilers for parallel languages
International Journal of Computational Science and Engineering
Efficient algorithms for parallelizing Monte Carlo simulations for 2D Ising spin models
The Journal of Supercomputing
A regression-based approach to scalability prediction
Proceedings of the 22nd annual international conference on Supercomputing
Optimal broadcast for fully connected processor-node networks
Journal of Parallel and Distributed Computing
A framework for adaptive collective communications for heterogeneous hierarchical computing systems
Journal of Computer and System Sciences
Adaptive approaches for efficient parallel algorithms on cluster-based systems
International Journal of Grid and Utility Computing
Mapping parallelism to multi-cores: a machine learning based approach
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
What the parallel-processing community has (failed) to offer the multi/many-core generation
Journal of Parallel and Distributed Computing
On designing optimal parallel triangular solvers
Information and Computation
Algorithms for memory hierarchies: advanced lectures
Algorithms for memory hierarchies: advanced lectures
Evolutionary Rough Parallel Multi-Objective Optimization Algorithm
Fundamenta Informaticae
Partitioning streaming parallelism for multi-cores: a machine learning based approach
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Fitting FFT onto an energy efficient massively parallel architecture
Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies
Algorithms and theory of computation handbook
A bridging model for multi-core computing
Journal of Computer and System Sciences
Contention-aware scheduling with task duplication
Journal of Parallel and Distributed Computing
Algorithm engineering: bridging the gap between algorithm theory and practice
Algorithm engineering: bridging the gap between algorithm theory and practice
The round complexity of distributed sorting: extended abstract
Proceedings of the 30th annual ACM SIGACT-SIGOPS symposium on Principles of distributed computing
A contention-aware performance model for HPC-based networks: a case study of the InfiniBand network
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Proceedings of the 2nd ACM Symposium on Cloud Computing
Scheduling malleable tasks with precedence constraints
Journal of Computer and System Sciences
Total exchange performance modelling under network contention
PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
Multi-DaC programming model: a variant of multi-BSP model for divide-and-conquer algorithms
DAMP '12 Proceedings of the 7th workshop on Declarative aspects and applications of multicore programming
Optimal broadcast for fully connected networks
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
An approximation algorithm for scheduling malleable tasks under general precedence constraints
ISAAC'05 Proceedings of the 16th international conference on Algorithms and Computation
Prediction of communication latency over complex network behaviors on SMP clusters
EPEW'05/WS-FM'05 Proceedings of the 2005 international conference on European Performance Engineering, and Web Services and Formal Methods, international conference on Formal Techniques for Computer Systems and Business Processes
Compiler-Directed performance model construction for parallel programs
ARCS'10 Proceedings of the 23rd international conference on Architecture of Computing Systems
Performance analysis and optimization of MPI collective operations on multi-core clusters
The Journal of Supercomputing
Multifaceted web services: an approach to secure and scalable grid scheduling
EuroWeb'02 Proceedings of the 2002 international conference on EuroWeb
Self--consistent MPI performance requirements
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
International Journal of High Performance Computing Applications
Journal of Computer and System Sciences
Using machine learning to partition streaming programs
ACM Transactions on Architecture and Code Optimization (TACO)
Making queries tractable on big data with preprocessing: through the eyes of complexity theory
Proceedings of the VLDB Endowment
Measurement of the latency parameters of the Multi-BSP model: a multicore benchmarking approach
The Journal of Supercomputing
Hi-index | 48.24 |