Limits on multiple instruction issue
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Limits of control flow on parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Fast parallel algorithms for short-range molecular dynamics
Journal of Computational Physics
The multicluster architecture: reducing cycle time through partitioning
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Active pages: a computation model for intelligent memory
Proceedings of the 25th annual international symposium on Computer architecture
A design analysis of a hybrid technology multithreaded architecture for petaflops scale computation3
ICS '99 Proceedings of the 13th international conference on Supercomputing
Microservers: a new memory semantics for massively parallel computing
ICS '99 Proceedings of the 13th international conference on Supercomputing
Mapping irregular applications to DIVA, a PIM-based data-intensive architecture
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
IEEE Micro
Impulse: Building a Smarter Memory Controller
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Supporting Fine-Grained Synchronization on a Simultaneous Multithreading Processor
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Mini-Threads: Increasing TLP on Small-Scale SMT Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
PIM Architectures to Support Petaflops Level Computation in the HTMT Machine
IWIA '99 Proceedings of the 1999 International Workshop on Innovative Architecture
Trading Bandwidth for Latency: Managing Continuations Through a Carpet Bag Cache
IWIA '02 Proceedings of the International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'02)
Inherently lower-power high-performance superscalar architectures
Inherently lower-power high-performance superscalar architectures
Blue Gene: a vision for protein science using a petaflop supercomputer
IBM Systems Journal - Deep computing for the life sciences
A multithreaded PowerPC processor for commercial servers
IBM Journal of Research and Development
International Journal of High Performance Computing Applications
The implications of working set analysis on supercomputing memory hierarchy design
Proceedings of the 19th annual international conference on Supercomputing
IEEE Transactions on Computers
From Silicon to Science: The Long Road to Production Reconfigurable Supercomputing
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Proceedings of the 7th ACM international conference on Computing frontiers
Hi-index | 0.00 |
Chip level multithreading is growing in use throughout the microprocessor world as evidenced in the Intel Pentium 4 and the upcoming innovations in the POWER architecture. These processors typically use a few coarse grain threads that can be difficult for the programmer or compiler to exploit; however, Processing in Memory (PIM) is a technology that has been explored through a long series of supercomputer projects as a facilitator for a different multithreaded execution models. In the multithreading model explored by PIMs, the threads can have radically different characteristics. Specifically, PIMs seek to exploit a large number of very fine grained threads to hide memory access latency and increase parallelism. PIM supports these small threads, or "threadlets", by providing a fast hardware synchronization mechanism, support for harware managment of creation and destruction of threads, and a "shared register" approach which extends the shared memory thread model. This paper discusses some analysis of some very large scientific codes in terms of how they might be mapped onto such a multithreading model with a focus on extremely fine grain threads.