Supporting dynamic data structures on distributed-memory machines

Authors:
Anne Rogers;Martin C. Carlisle;John H. Reppy;Laurie J. Hendren
Affiliations:
Princeton Univ., Princeton, NJ;Princeton Univ., Princeton, NJ;AT&T Bell Labs, Murray Hill, NJ;McGill Univ., Montreal, P.Q., Canada
Venue:
ACM Transactions on Programming Languages and Systems (TOPLAS)
Year:
1995

Citing 38
Cited 98

MULTILISP: a language for concurrent symbolic computation

ACM Transactions on Programming Languages and Systems (TOPLAS)
Automatic translation of FORTRAN programs to vector form

ACM Transactions on Programming Languages and Systems (TOPLAS)
Fine-grained mobility in the Emerald system

ACM Transactions on Computer Systems (TOCS)
An overview for the PTRAN analysis system for multiprocessing

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Detecting conflicts between structure accesses

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Restructuring Lisp programs for concurrent execution

PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
Adaptive bitonic sorting: an optimal parallel algorithm for shared-memory machines

SIAM Journal on Computing
Process decomposition through locality of reference

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Experience with CST: programming and implementation

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
The Amber system: parallel programming on a network of multiprocessors

SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Introduction to algorithms

Introduction to algorithms
Parallelizing programs with recursive data structures

Parallelizing programs with recursive data structures
Compiling Lisp programs for parallel execution

Lisp and Symbolic Computation
Virtual memory primitives for user programs

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Compiler optimizations for Fortran D on MIMD distributed-memory machines

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Compiling programs for nonshared memory machines

Compiling programs for nonshared memory machines
Orca: A Language for Parallel Programming of Distributed Systems

IEEE Transactions on Software Engineering
Active messages: a mechanism for integrated communication and computation

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Abstractions for recursive pointer data structures: improving the analysis and transformation of imperative programs

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Global optimizations for parallelism and locality on scalable parallel machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Communication optimization and code generation for distributed memory machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Computation migration: enhancing locality for distributed-memory parallel systems

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Decentralized optimal power pricing: the development of a parallel program

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Parallel programming in Split-C

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Concert-efficient runtime support for concurrent object-oriented programming languages on stock hardware

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Fine-grain access control for distributed shared memory

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
A model and stack implementation of multiple environments

Communications of the ACM
Distributed data structures in Linda

POPL '86 Proceedings of the 13th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Para-functional programming: a paradigm for programming multiprocessor systems

POPL '86 Proceedings of the 13th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Optimizing Supercompilers for Supercomputers

Optimizing Supercompilers for Supercomputers
A Retargetable C Compiler: Design and Implementation

A Retargetable C Compiler: Design and Implementation
Parallelizing Programs with Recursive Data Structures

IEEE Transactions on Parallel and Distributed Systems
Lazy Task Creation: A Technique for Increasing the Granularity of Parallel Programs

IEEE Transactions on Parallel and Distributed Systems
Supporting SPMD Execution for Dynamic Data Structures

Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Early Experiences with Olden

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Cid: A Parallel, "Shared-Memory" C for Distributed-Memory Machines

LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
The concert system--compiler and runtime support for efficient, fine-grained concurrent object-oriented programs

The concert system--compiler and runtime support for efficient, fine-grained concurrent object-oriented programs
Restructuring symbolic programs for concurrent execution on multiprocessors

Restructuring symbolic programs for concurrent execution on multiprocessors

Software caching and computation migration in Olden

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
A hybrid execution model for fine-grained languages on distributed memory multicomputers

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Compiler-based prefetching for recursive data structures

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Integrating task and data parallelism using shared objects

ICS '96 Proceedings of the 10th international conference on Supercomputing
Is it a tree, a DAG, or a cyclic graph? A shape analysis for heap-directed pointers in C

POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Fine-grain multithreading with minimal compiler support—a cost effective approach to implementing efficient multithreading languages

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Space-efficient implementation of nested parallelism

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Predicting data cache misses in non-numeric applications through correlation profiling

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Putting pointer analysis to work

POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Communication optimizations for parallel C programs

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Dependence based prefetching for linked data structures

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Automatic Compiler-Inserted Prefetching for Pointer-Based Applications

IEEE Transactions on Computers - Special issue on cache memory and related problems
Locality Analysis for Parallel C Programs

IEEE Transactions on Parallel and Distributed Systems
Effective jump-pointer prefetching for linked data structures

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
StackThreads/MP: integrating futures into calling standards

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Cache-conscious structure layout

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Space-efficient scheduling of nested parallelism

ACM Transactions on Programming Languages and Systems (TOPLAS)
Ace: a language for parallel programming with customizable protocols

ACM Transactions on Computer Systems (TOCS)
Type systems for distributed data structures

Proceedings of the 27th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Push vs. pull: data movement for linked data structures

Proceedings of the 14th international conference on Supercomputing
Automatic compiler techniques for thread coarsening for multithreaded architectures

Proceedings of the 14th international conference on Supercomputing
Understanding Why Correlation Profiling Improves the Predictability of Data Cache Misses in Nonnumeric Applications

IEEE Transactions on Computers
Instruction path coprocessors

Proceedings of the 27th annual international symposium on Computer architecture
Automated data-member layout of heap objects to improve memory-hierarchy performance

ACM Transactions on Programming Languages and Systems (TOPLAS)
PipeRench implementation of the instruction path coprocessor

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Dynamic computation migration in DSM systems

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Simulation of the 3 dimensional cascade flow with numerical wind tunnel (NWT)

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Evaluating the impact of memory system performance on software prefetching and locality optimizations

ICS '01 Proceedings of the 15th international conference on Supercomputing
Dynamically allocating processor resources between nearby and distant ILP

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Locality vs. criticality

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Source-level global optimizations for fine-grain distributed shared memory systems

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
SPMD execution in the presence of dynamic data structures

Compiler optimizations for scalable parallel systems
Supporting dynamic data structures with Olden

Compiler optimizations for scalable parallel systems
Run-time power estimation in high performance microprocessors

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Benchmark health considered harmful

ACM SIGARCH Computer Architecture News
Pretenuring for Java

OOPSLA '01 Proceedings of the 16th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
A hierarchical load-balancing framework for dynamic multithreaded computations

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Reducing the complexity of the register file in dynamic superscalar processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Using the Cowichan Problems to Assess the Usability of Orca

IEEE Parallel & Distributed Technology: Systems & Technology
Data collection and restoration for heterogenenous process migration

Software—Practice & Experience
Optimizing COOP Languages: Study of a Protein Dynamics Program

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Maintaining Spatial Data Sets in Distributed-Memory Machines

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
A Programmable Memory Hierarchy for Prefetching Linked Data Structures

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Content-Based Prefetching: Initial Results

IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
Executing multiple pipelined data analysis operations in the grid

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Continuous program optimization: A case study

ACM Transactions on Programming Languages and Systems (TOPLAS)
Programming the FlexRAM parallel intelligent memory system

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
A compiler approach for reducing data cache energy

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
On the Interaction of Mobile Processes and Objects

HCW '98 Proceedings of the Seventh Heterogeneous Computing Workshop
Memory Space Representation for Heterogeneous Network Process Migration

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Mark-copy: fast copying GC with less space overhead

OOPSLA '03 Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications
A Framework to Capture Dynamic Data Structures in Pointer-Based Codes

IEEE Transactions on Parallel and Distributed Systems
Parallel functional programming on recursively defined data via data-parallel recursion

Journal of Functional Programming
Algorithm + strategy = parallelism

Journal of Functional Programming
A general framework for prefetch scheduling in linked data structures and its application to multi-chain prefetching

ACM Transactions on Computer Systems (TOCS)
From Sequences of Dependent Instructions to Functions: An Approach for Improving Performance without ILP or Speculation

Proceedings of the 31st annual international symposium on Computer architecture
A study of source-level compiler algorithms for automatic construction of pre-execution code

ACM Transactions on Computer Systems (TOCS)
Tolerating memory latency through push prefetching for pointer-intensive applications

ACM Transactions on Architecture and Code Optimization (TACO)
Automatic pool allocation: improving performance by controlling data structure layout in the heap

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
PARE: a power-aware hardware data prefetching engine

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Reducing data cache leakage energy using a compiler-based approach

ACM Transactions on Embedded Computing Systems (TECS)
Address-Value Delta (AVD) Prediction: Increasing the Effectiveness of Runahead Execution by Exploiting Regular Memory Allocation Patterns

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Load squared: adding logic close to memory to reduce the latency of indirect loads with high miss ratios

MEDEA '04 Proceedings of the 2004 workshop on MEmory performance: DEaling with Applications , systems and architecture
Transparent pointer compression for linked data structures

Proceedings of the 2005 workshop on Memory system performance
On the parallelization of irregular and dynamic programs

Parallel Computing
On the Prediction of Java Object Lifetimes

IEEE Transactions on Computers
Data prefetching in a cache hierarchy with high bandwidth and capacity

MEDEA '06 Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures
The DaCapo benchmarks: java benchmarking development and analysis

Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications
Profile-based pretenuring

ACM Transactions on Programming Languages and Systems (TOPLAS)
Address-Value Delta (AVD) Prediction: A Hardware Technique for Efficiently Parallelizing Dependent Cache Misses

IEEE Transactions on Computers
TPCC-UVa: an open-source TPC-C implementation for global performance measurement of computer systems

ACM SIGMOD Record
Interprocedural definition-use chains of dynamic pointer-linked data structures

Scientific Programming
Manticore: a heterogeneous parallel language

Proceedings of the 2007 workshop on Declarative aspects of multicore programming
Data prefetching in a cache hierarchy with high bandwidth and capacity

ACM SIGARCH Computer Architecture News
Hardbound: architectural support for spatial safety of the C programming language

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Samurai: protecting critical data in unsafe languages

Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008
Memory performance attacks: denial of memory service in multi-core systems

SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
Load squared: Adding logic close to memory to reduce the latency of indirect loads in embedded and general systems

Journal of Embedded Computing - Embeded Processors and Systems: Architectural Issues and Solutions for Emerging Applications
MPADS: memory-pooling-assisted data splitting

Proceedings of the 7th international symposium on Memory management
A scheduling framework for general-purpose parallel languages

Proceedings of the 13th ACM SIGPLAN international conference on Functional programming
Global trees: a framework for linked data structures on distributed memory parallel systems

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Abstracting access patterns of dynamic memory using regular expressions

ACM Transactions on Architecture and Code Optimization (TACO)
SoftBound: highly compatible and complete spatial memory safety for c

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
ECMon: exposing cache events for monitoring

Proceedings of the 36th annual international symposium on Computer architecture
Memory management thread for heap allocation intensive sequential applications

Proceedings of the 10th workshop on MEmory performance: DEaling with Applications, systems and architecture
Layout transformations for heap objects using static access patterns

CC'07 Proceedings of the 16th international conference on Compiler construction
An utilization driven framework for energy efficient caches

HiPC'08 Proceedings of the 15th international conference on High performance computing
TPCC-UVa: an open-source TPC-C implementation for parallel and distributed systems

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Energy-efficient hardware data prefetching

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Multicore performance optimization using partner cores

HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Automatically generating symbolic prefetches for distributed transactional memories

Proceedings of the ACM/IFIP/USENIX 11th International Conference on Middleware
Cooperative parallelization

Proceedings of the International Conference on Computer-Aided Design
Runtime biased pointer reuse analysis and its application to energy efficiency

PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
Energy-aware data prefetching for general-purpose programs

PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
MemSafe: ensuring the spatial and temporal memory safety of C at runtime

Software—Practice & Experience
Data Linkage Algebra, Data Linkage Dynamics, and Priority Rewriting

Fundamenta Informaticae

Quantified Score

Hi-index	0.01

Visualization

Abstract

Compiling for distributed-memory machines has been a very active research area in recent years. Much of this work has concentrated on programs that use arrays as their primary data structures. To date, little work has been done to address the problem of supporting programs that use pointer-based dynamic data structures. The techniques developed for supporting SPMD execution of array-based programs rely on the fact that arrays are statically defined and directly addressable. Recursive data structures do not have these properties, so new techniques must be developed. In this article, we describe an execution model for supporting programs that use pointer-based dynamic data structures. This model uses a simple mechanism for migrating a thread of control based on the layout of heap-allocated data and introduces parallelism using a technique based on futures and lazy task creation. We intend to exploit this execution model using compiler analyses and automatic parallelization techniques. We have implemented a prototype system, which we call Olden, that runs on the Intel iPSC/860 and the Thinking Machines CM-5. We discuss our implementation and report on experiments with five benchmarks.