Characterizing a new class of threads in scientific applications for high end supercomputers

Authors:
Arun Rodrigues;Richard Murphy;Peter Kogge;Keith Underwood
Affiliations:
University of Notre Dame, Notre Dame, IN;University of Notre Dame, Notre Dame, IN;University of Notre Dame, Notre Dame, IN;Sandia National Lab, Albuquerque, NM
Venue:
Proceedings of the 18th annual international conference on Supercomputing
Year:
2004

Citing 18
Cited 5

Limits on multiple instruction issue

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Limits of control flow on parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Fast parallel algorithms for short-range molecular dynamics

Journal of Computational Physics
The multicluster architecture: reducing cycle time through partitioning

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Active pages: a computation model for intelligent memory

Proceedings of the 25th annual international symposium on Computer architecture
A design analysis of a hybrid technology multithreaded architecture for petaflops scale computation3

ICS '99 Proceedings of the 13th international conference on Supercomputing
Microservers: a new memory semantics for massively parallel computing

ICS '99 Proceedings of the 13th international conference on Supercomputing
Mapping irregular applications to DIVA, a PIM-based data-intensive architecture

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors

IEEE Micro
A Case for Intelligent RAM

IEEE Micro
Impulse: Building a Smarter Memory Controller

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Supporting Fine-Grained Synchronization on a Simultaneous Multithreading Processor

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Mini-Threads: Increasing TLP on Small-Scale SMT Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
PIM Architectures to Support Petaflops Level Computation in the HTMT Machine

IWIA '99 Proceedings of the 1999 International Workshop on Innovative Architecture
Trading Bandwidth for Latency: Managing Continuations Through a Carpet Bag Cache

IWIA '02 Proceedings of the International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'02)
Inherently lower-power high-performance superscalar architectures

Inherently lower-power high-performance superscalar architectures
Blue Gene: a vision for protein science using a petaflop supercomputer

IBM Systems Journal - Deep computing for the life sciences
A multithreaded PowerPC processor for commercial servers

IBM Journal of Research and Development

Analyzing the Impact of Overlap, Offload, and Independent Progress for Message Passing Interface Applications

International Journal of High Performance Computing Applications
The implications of working set analysis on supercomputing memory hierarchy design

Proceedings of the 19th annual international conference on Supercomputing
On the Memory Access Patterns of Supercomputer Applications: Benchmark Selection and Its Implications

IEEE Transactions on Computers
From Silicon to Science: The Long Road to Production Reconfigurable Supercomputing

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Models for generating locality-tuned traveling threads for a hierarchical multi-level heterogeneous multicore

Proceedings of the 7th ACM international conference on Computing frontiers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Chip level multithreading is growing in use throughout the microprocessor world as evidenced in the Intel Pentium 4 and the upcoming innovations in the POWER architecture. These processors typically use a few coarse grain threads that can be difficult for the programmer or compiler to exploit; however, Processing in Memory (PIM) is a technology that has been explored through a long series of supercomputer projects as a facilitator for a different multithreaded execution models. In the multithreading model explored by PIMs, the threads can have radically different characteristics. Specifically, PIMs seek to exploit a large number of very fine grained threads to hide memory access latency and increase parallelism. PIM supports these small threads, or "threadlets", by providing a fast hardware synchronization mechanism, support for harware managment of creation and destruction of threads, and a "shared register" approach which extends the shared memory thread model. This paper discusses some analysis of some very large scientific codes in terms of how they might be mapped onto such a multithreading model with a focus on extremely fine grain threads.