Automatic decomposition of scientific programs for parallel execution
POPL '87 Proceedings of the 14th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Loop distribution with arbitrary control flow
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Precise compile-time performance prediction for superscalar-based computers
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Static dependent costs for estimating execution time
LFP '94 Proceedings of the 1994 ACM conference on LISP and functional programming
Active pages: a computation model for intelligent memory
Proceedings of the 25th annual international symposium on Computer architecture
Maps: a compiler-managed memory system for raw machines
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Adapting cache line size to application behavior
ICS '99 Proceedings of the 13th international conference on Supercomputing
Mapping irregular applications to DIVA, a PIM-based data-intensive architecture
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
A Survey of Parallel Machine Organization and Programming
ACM Computing Surveys (CSUR)
IEEE Micro
MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors
MASCOTS '94 Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems
FlexRAM: Toward an Advanced Intelligent Memory System
ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
Evaluation of Computing in Memory Architectures for Digital Image Processing Applications
ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
An intelligent memory for data-parallel applications
An intelligent memory for data-parallel applications
Improving workload balance and code optimization on processor-in-memory systems
Journal of Systems and Software
Dynamic memory access management for high-performance DSP applications using high-level synthesis
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Hi-index | 0.01 |
Continuous improvements in semiconductor fabrication density are supporting new classes of System-on-a-Chip (SoC) architectures that combine extensive processing logic/processor with high-density memory. Such architectures are generally called Processor-in-Memory (PIM) or Intelligent Memory (I-RAM) and can support high-performance computing by reducing the performance gap between the processor and the memory. The PIM architecture combines various processors in a single system. These processors are characterized by their computation and memory-access capabilities. Therefore, a novel strategy must be developed to identify their capabilities and dispatch the most appropriate jobs to them in order to exploit them fully. Accordingly, this study presents an automatic source-to-source parallelizing system, called statement-analysis-grouping-evaluation (SAGE), to exploit the advantages of PIM architectures. Unlike conventional iteration-based parallelizing systems, SAGE adopts statement-based analyzing approaches. This study addresses the configuration of a PIM architecture with one host processor (i.e., the main processor in state-of-the-art computer systems) and one memory processor (i.e., the computing logic integrated with the memory). The strategy of the SAGE system, in which the original program is decomposed into blocks and a feasible execution schedule is produced for the host and memory processors, is investigated as well. The experimental results for real benchmarks are also discussed.