Partitioning sparse matrices with eigenvectors of graphs
SIAM Journal on Matrix Analysis and Applications
Run-Time Parallelization and Scheduling of Loops
IEEE Transactions on Computers
Integrating message-passing and shared-memory: early experience
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Implementing an irregular application on a distributed memory multiprocessor
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Parallel programming in Split-C
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Data and program restructuring of irregular applications for cache-coherent multiprocessor
ICS '94 Proceedings of the 8th international conference on Supercomputing
Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Software versus hardware shared-memory implementation: a case study
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tempest and typhoon: user-level shared memory
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Fine-grain access control for distributed shared memory
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Support for distributed dynamic data structures in C++
Support for distributed dynamic data structures in C++
A manual for the CHAOS runtime library
A manual for the CHAOS runtime library
Run-time and compile-time support for adaptive irregular problems
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Application-specific protocols for user-level shared memory
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Improving the performance of DSM systems via compiler involvement
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Compiling Global Name-Space Parallel Loops for Distributed Execution
IEEE Transactions on Parallel and Distributed Systems
Slicing Analysis and Indirect Accesses to Distributed Arrays
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Tempest: a substrate for portable parallel programs
COMPCON '95 Proceedings of the 40th IEEE Computer Society International Conference
A parallel software infrastructure for structured adaptive mesh methods
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Decoupled hardware support for distributed shared memory
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Coherent network interfaces for fine-grain communication
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Synchronization hardware for networks of workstations: performance vs. cost
ICS '96 Proceedings of the 10th international conference on Supercomputing
Compiler and software distributed shared memory support for irregular applications
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Ace: linguistic mechanisms for customizable protocols
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimizing communication in HPF programs on fine-grain distributed shared memory
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Coherence controller architectures for SMP-based CC-NUMA multiprocessors
Proceedings of the 24th annual international symposium on Computer architecture
Using prediction to accelerate coherence protocols
Proceedings of the 25th annual international symposium on Computer architecture
Retrospective: tempest and typhoon: user-level shared memory
25 years of the international symposia on Computer architecture (selected papers)
Hardware Support for Flexible Distributed Shared Memory
IEEE Transactions on Computers
Coherence Controller Architectures for Scalable Shared-Memory Multiprocessors
IEEE Transactions on Computers - Special issue on cache memory and related problems
Maps: a compiler-managed memory system for raw machines
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Multicast snooping: a new coherence method using a multicast address network
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
The scalability of multigrain systems
ICS '99 Proceedings of the 13th international conference on Supercomputing
Ace: a language for parallel programming with customizable protocols
ACM Transactions on Computer Systems (TOCS)
ACM Transactions on Computer Systems (TOCS)
Improving fine-grained irregular shared-memory benchmarks by data reordering
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
ENSEMBLE: A Communication Layer for Embedded Multi-Processor Systems
OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
Compiler Support for Scalable and Efficient Memory Systems
IEEE Transactions on Computers
Eliminating Barrier Synchronization for Compiler-Parallelized Codes on Software DSMs
International Journal of Parallel Programming
Parallelizing graph construction operations in programs with cyclic graphs
Parallel Computing
Impact of Data Distribution on Performance of Irregular Reductions on Multithreaded Architectures
HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Enhancing Software DSM for Compiler-Parallelized Applications
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Compiler and Runtime Support for Irregular Reductions on a Multithreaded Architecture
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Improving Compiler and Run-Time Support for Irregular Reductions Using Local Writes
LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
The Data Mover: A Machine-Independent Abstraction for Managing Customized Data Motion
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
PIT: A Library for the Parallelization of Irregular Problems
PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
Collecting Remote Data in Irregular Problems with Hierarchical Representation of the Domain
Proceedings of the 8th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Identifying parallelism in programs with cyclic graphs
Journal of Parallel and Distributed Computing
Identifying Parallelism in Programs with Cyclic Graphs
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Sparks: coherence as an abstract type
IWOOOS '96 Proceedings of the 5th International Workshop on Object Orientation in Operating Systems (IWOOOS '96)
Compile-time Synchronization Optimizations for Software DSMs
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
A Two-Level Directory Architecture for Highly Scalable cc-NUMA Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Memory coherence activity prediction in commercial workloads
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Temporal Streaming of Shared Memory
Proceedings of the 32nd annual international symposium on Computer Architecture
Store-Ordered Streaming of Shared Memory
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
An efficient cache design for scalable glueless shared-memory multiprocessors
Proceedings of the 3rd conference on Computing frontiers
Interprocedural definition-use chains of dynamic pointer-linked data structures
Scientific Programming
Extending CC-NUMA systems to support write update optimizations
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Journal of Parallel and Distributed Computing
Asynchronous progressive irregular prefix operation in HPF2
EURO-PDP'00 Proceedings of the 8th Euromicro conference on Parallel and distributed processing
Proceedings of the international conference on Supercomputing
An idiom-finding tool for increasing productivity of accelerators
Proceedings of the international conference on Supercomputing
Leveraging data-structure semantics for efficient algorithmic parallelism
Proceedings of the 8th ACM International Conference on Computing Frontiers
Memory subsystem characterization in a 16-core snoop-based chip-multiprocessor architecture
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
A novel lightweight directory architecture for scalable shared-memory multiprocessors
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Prototyping hardware support for irregular applications
Proceedings of the 2013 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools
Hi-index | 0.01 |
Irregular computation problems underlie many important scientific applications. Although these problems are computationally expensive, and so would seem appropriate for parallel machines, their irregular and unpredictable run-time behavior makes this type of parallel program difficult to write and adversely affects run-time performance.This paper explores three issues—partitioning, mutual exclusion, and data transfer—crucial to the efficient execution of irregular problems on distributed-memory machines. Unlike previous work, we studied the same programs running in three alternative systems on the same hardware base (a Thinking Machines CM-5): the CHAOS irregular application library, Transparent Shared Memory (TSM), and eXtensible Shared Memory (XSM). CHAOS and XSM performed equivalently for all three applications. Both systems were somewhat (13%) to significantly faster (991%) than TSM.