The APRAM: incorporating asynchrony into the PRAM model
SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
Process coordination with fetch-and-increment
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Limits of control flow on parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Highly parallel computing (2nd ed.)
Highly parallel computing (2nd ed.)
SUIF: an infrastructure for research on parallelizing and optimizing compilers
ACM SIGPLAN Notices
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ICS '90 Proceedings of the 4th international conference on Supercomputing
The SimpleScalar tool set, version 2.0
ACM SIGARCH Computer Architecture News
Explicit multi-threading (XMT) bridging models for instruction parallelism (extended abstract)
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Active disks: programming model, algorithms and evaluation
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
A no-busy-wait balanced tree parallel algorithmic paradigm
Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Experiments with list ranking for explicit multi-threaded (XMT) instruction parallelism
Journal of Experimental Algorithmics (JEA)
Computer
IEEE Micro
Two techniques for reconciling algorithm parallelism with memory constraints
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Effectively sharing a cache among threads
Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Circulating shared-registers for multiprocessor systems
Journal of Systems Architecture: the EUROMICRO Journal
Fpga-based prototype of a pram-on-chip processor
Proceedings of the 5th conference on Computing frontiers
Case study of gate-level logic simulation on an extremely fine-grained chip multiprocessor
Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
A pilot study to compare programming effort for two parallel programming models
Journal of Systems and Software
Mesh-of-trees and alternative interconnection networks for single-chip parallelism
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
A Low-Overhead Asynchronous Interconnection Network for GALS Chip Multiprocessors
NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
Preliminary analysis of feasible benchmark problems for the hydrid PRAM/NUMA REPLICA architecture
Proceedings of the 13th International Conference on Computer Systems and Technologies
Hi-index | 0.00 |
Explicit-multithreading (XMT) is a parallel programming approach for exploiting on-chip parallelism. XMT introduces a computational framework with 1) a simple programming style that relies on fine-grained PRAM-style algorithms; 2) hardware support for low-overhead parallel threads, scalable load balancing, and efficient synchronization. The missing link between the algorithmic-programming level and the architecture level is provided by the first prototype XMT compiler. This paper also takes this new opportunity to evaluate the overall effectiveness of the interaction between the programming model and the hardware, and enhance its performance where needed, incorporating new optimizations into the XMT compiler. We present a wide range of applications, which written in XMT obtain significant speedups relative to the best serial programs. We show that XMT is especially useful for more advanced applications with dynamic, irregular access pattern, where for regular computations we demonstrate performance gains that scale up to much higher levels than have been demonstrated before for on-chip systems.