FFTs in external or hierarchical memory
The Journal of Supercomputing
A comparison of sorting algorithms for the connection machine CM-2
SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
Integrating message-passing and shared-memory: early experience
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
On the versatility of parallel sorting by regular sampling
Parallel Computing
A comparison of message passing and shared memory architectures for data parallel programs
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Where is time spent in message-passing and shared-memory programs?
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The performance advantages of integrating block data transfer in cache-coherent multiprocessors
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Implications of hierarchical N-body methods for multiprocessor architectures
ACM Transactions on Computer Systems (TOCS)
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Quantifying the performance differences between PVM and TreadMarks
Journal of Parallel and Distributed Computing
IO-lite: a unified I/O buffering and caching system
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Scaling application performance on a cache-coherent multiprocessor
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
CANPC '98 Proceedings of the Second International Workshop on Network-Based Parallel Computing: Communication, Architecture, and Applications
Programming FFT on DSM Multiprocessors
HPC '00 Proceedings of the The Fourth International Conference on High-Performance Computing in the Asia-Pacific Region-Volume 2 - Volume 2
A comparison of three programming models for adaptive applications on the origin2000
Journal of Parallel and Distributed Computing
Message passing and shared address space parallelism on an SMP cluster
Parallel Computing
Assessing the potential of hybrid hpc systems for scientific applications: a case study
Proceedings of the 4th international conference on Computing frontiers
Experiences using OpenMP based on compiler directed software DSM on a PC cluster
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
WOMPAT'04 Proceedings of the 5th international conference on OpenMP Applications and Tools: shared Memory Parallel Programming with OpenMP
Remote store programming: a memory model for embedded multicore
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Hi-index | 0.02 |
We compare the performance of three major programming models on a modern, 64-processor hardware cache-coherent machine, one of the two major types of platforms upon which high-performance computing is converging. We focus on applications that are either regular, predictable or at least do not require fine-grained dynamic replication of irregularly accessed data. Within this class, we use programs with a range of important communication patterns. We examine whether the basic parallel algorithm and communication structuring approaches needed for best performance are similar or different among the models, whether some models have substantial performance advantages over others as problem size and number of processors change, what the sources of these performance differences are, where the programs spend their time, and whether substantial improvements can be obtained by modifying either the application programming interfaces or the implementations of the programming models on this type of tightly-coupled multiprocessor platform.