Compilation and delayed evaluation in APL

Authors:
Leo J. Guibas;Douglas K. Wyatt
Affiliations:
Xerox Palo Alto Research Center, Palo Alto, Cal.;Xerox Palo Alto Research Center, Palo Alto, Cal.
Venue:
POPL '78 Proceedings of the 5th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Year:
1978

Citing 4
Cited 51

Abstraction mechanisms in CLU

Proceedings of an ACM conference on Language design for reliable software
A lazy evaluator

POPL '76 Proceedings of the 3rd ACM SIGACT-SIGPLAN symposium on Principles on programming languages
An approach to computer language design

An approach to computer language design
Simula Begin

Simula Begin

A Directly Executable Encoding for APL

ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiling APL: the Yorktown APL translator

IBM Journal of Research and Development
Efficient interpretation of synchronizable series expressions

SIGPLAN '87 Papers of the Symposium on Interpreters and interpretive techniques
A functional programming language compiler for massively parallel computers

LFP '90 Proceedings of the 1990 ACM conference on LISP and functional programming
ACORN: APL to C on real numbers

APL '90 Conference proceedings on APL 90: for the future
Automatic transformation of series expressions into loops

ACM Transactions on Programming Languages and Systems (TOPLAS)
Size and access inference for data-parallel programs

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
On performance and space usage improvements for parallelized compiled APL code

APL '91 Proceedings of the international conference on APL '91
A call to order

PODS '93 Proceedings of the twelfth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Compiling nested data-parallel programs for shared-memory multiprocessors

ACM Transactions on Programming Languages and Systems (TOPLAS)
Optimizing dynamically-dispatched calls with run-time type feedback

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
An array operation synthesis scheme to optimize Fortran 90 programs

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Type feedback vs. concrete type inference: a comparison of optimization techniques for object-oriented languages

Proceedings of the tenth annual conference on Object-oriented programming systems, languages, and applications
Reconciling responsiveness with performance in pure object-oriented languages

ACM Transactions on Programming Languages and Systems (TOPLAS)
Techniques for the translation of MATLAB programs into Fortran 90

ACM Transactions on Programming Languages and Systems (TOPLAS)
Generators in Icon

ACM Transactions on Programming Languages and Systems (TOPLAS)
An APL Compiler for a Vector Processor

ACM Transactions on Programming Languages and Systems (TOPLAS)
A language for bitmap manipulation

ACM Transactions on Graphics (TOG)
A real-time garbage collector based on the lifetimes of objects

Communications of the ACM
Space-time trade-off optimization for a class of electronic structure calculations

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Carrier arrays: an idiom-preserving extension to APL

POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Paging as a "language processing" task

POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Tracing Lineage of Array Data

Journal of Intelligent Information Systems
The Classification, Fusion, and Parallelization of Array Language Primitives

IEEE Transactions on Parallel and Distributed Systems
Segmented Alignment: An Enhanced Model to Align Data Parallel Programs of HPF

The Journal of Supercomputing
Optimization of Memory Usage Requirement for a Class of Loops Implementing Multi-dimensional Integrals

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
On Materializations of Array-Valued Temporaries

LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Query processing techniques for arrays

The VLDB Journal — The International Journal on Very Large Data Bases
Static array storage optimization in MATLAB

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Expressional loops

POPL '84 Proceedings of the 11th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
An APL compiler for the UNIX timesharing system

APL '83 Proceedings of the international conference on APL
Stream processing

LFP '84 Proceedings of the 1984 ACM Symposium on LISP and functional programming
Rewriting systems on FP expressions that reduce the number of sequences they yield

LFP '84 Proceedings of the 1984 ACM Symposium on LISP and functional programming
Tentative compilation: A design for an APL compiler

APL '79 Proceedings of the international conference on APL: part 1
Steps toward an APL compiler

APL '79 Proceedings of the international conference on APL: part 1
Applicative style programming, program transformation, and list operators

FPCA '81 Proceedings of the 1981 conference on Functional programming languages and computer architecture
Representation of factual information by equations and their evaluation

ICSE '82 Proceedings of the 6th international conference on Software engineering
Anticipatory Optimization in Domain Specific Translation

ICSR '98 Proceedings of the 5th International Conference on Software Reuse
A New Architecture for Transformation-Based Generators

IEEE Transactions on Software Engineering
Optimizing aggregate array computations in loops

ACM Transactions on Programming Languages and Systems (TOPLAS)
Culturing objects to achieve efficient implementations and entity interactions in object-oriented programming

ACM-SE 33 Proceedings of the 33rd annual on Southeast regional conference
Leveraging .NET meta-programming components from F#: integrated queries and interoperable heterogeneous execution

Proceedings of the 2006 workshop on ML
Accelerator: using data parallelism to program GPUs for general-purpose uses

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
On minimizing materializations of array-valued temporaries

ACM Transactions on Programming Languages and Systems (TOPLAS)
A translator system for the MATLAB language: Research Articles

Software—Practice & Experience
A method for automatically analyzing programs

IJCAI'79 Proceedings of the 6th international joint conference on Artificial intelligence - Volume 2
A knowledge based program editor

IJCAI'81 Proceedings of the 7th international joint conference on Artificial intelligence - Volume 2
A uniform way of reasoning about array-based computation in radar: Algebraically connecting the hardware/software boundary

Digital Signal Processing
Memory minimization for tensor contractions using integer linear programming

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A lazy, self-optimising parallel matrix library

FP'95 Proceedings of the 1995 international conference on Functional Programming
Riposte: a trace-driven compiler and parallel VM for vector code in R

Proceedings of the 21st international conference on Parallel architectures and compilation techniques

Quantified Score

Hi-index	0.02

Visualization

Abstract

Most existing APL implementations are interpretive in nature,that is, each time an APL statement is encountered it is executedby a body of code that is perfectly general, i.e. capable ofevaluating any APL expression, and is in no way tailored to thestatement on hand. This costly generality is said to be justifiedbecause APL variables are typeless and thus can vary arbitrarily intype, shape, and size during the execution of a program. What thisargument overlooks is that the operational semantics of an APLstatement are not modified by the varying storage requirements ofits variables.The first proposal for a non fully interpretive implementationwas the thesis of P. Abrams [1], in which a high level interpretercan defer performing certain operations by compiling code which alow level interpreter must later be called upon to execute. Thebenefit thus gained is that intelligence gathered from a widercontext can be brought to bear on the evaluation of asubexpression. Thus on evaluating (A+B)[I],only the addition A[I]+B[I] will beperformed. More recently, A. Perlis and several of his students atYale [9,10] have presented a scheme by which a full-fledged APLcompiler can be written. The compiled code generated can then bevery efficiently executed on a specialized hardware processor. Asimilar scheme is used in the newly released HP/3000 APL [12].This paper builds on and extends the above ideas in severaldirections. We start by studying in some depth the two key notionsall this work has in common, namely compilation anddelayed evaluation in the context of APL. By delayedevaluation we mean the strategy of deferring the computation ofintermediate results until the moment they are needed. Thus largeintermediate expressions are not built in storage; instead theirelements are "streamed" in time. Delayed evaluation for APL wasprobably first proposed by Barton (see [8]).Many APL operators do not correspond to any real dataoperations. Instead their effect is to rename the elements of thearray they act upon. A wide class of such operators, which we willcall the grid selectors, can be handled by essentiallypushing them down the expression tree and incorporating theireffect into the leaf accessors. Semantically this is equivalent tothe drag-along transformations described by Abrams.Performing this optimization will be shown to be an integral partof delayed evaluation.In order to focus our attention on the above issues, we make anumber of simplifying assumptions. We confine our attention to codecompilation for single APL expressions, such as might occur in an"APL Calculator", where user defined functions are not allowed. Ofcourse we will be critically concerned with the re-usability of thecompiled code for future evaluations. We also ignore thedistinctions among the various APL primitive types and assume thatall our arrays are of one uniform numeric type. We have studied thesituation without these simplifying assumptions, but plan to reporton this elsewhere.The following is a list of the main contributions of thispaper." We present an algorithm for incorporating the selectoroperators into the accessors for the leaves of the expression tree.The algorithm runs in time proportional to the size of the tree, asopposed to its path length (which is the case for the algorithms of[10] and [12]).Although arbitrary reshapes cannot be handled by the abovealgorithm, an especially important case can: that of aconforming reshape. The reshape AñB iscalled conforming if ñB is a suffix of A." By using conforming reshapes we can eliminate inner and outerproducts from the expression tree and replace them with scalaroperators and reductions along the last dimension. We do this byintroducing appropriate selectors on the product arguments, theneventually absorbing these selectors into the leaf accessors. Thesame mechanism handles scalar extension, the convention ofmaking scalar operands of scalar operators conform to arbitraryarrays." Once products, scalar extensions, and selectors have beeneliminated, what is left is an expression tree consisting entirelyof scalar operators and reductions along the last dimension. As aconsequence, during execution, the dimension currently being workedon obeys a strict stack-like discipline. This implies that we cangenerate extremely efficient code that is independent of theranks of the arguments.Several APL operators use the elements of their operands severaltimes. A pure delayed evaluation strategy would require multiplereevaluations." We introduce a general buffering mechanism, calledslicing, which allows portions of a subexpression that willbe repeatedly needed to be saved, to avoid future recomputation.Slicing is well integrated with the evaluation on demand mechanism.For example, when operators that break the streaming areencountered, slicing is used to determine the minimum size bufferrequired between the order in which a subexpression can deliver itsresult, and the order in which the full expression needs it." The compiled code is very efficient. A minimal number of loopvariables is maintained and accessors are shared among as manyexpression atoms as possible. Finally, the code generated is wellsuited for execution by an ordinary minicomputer, such as a PDP-11,or a Data General Nova. We have implemented this compiler on theAlto computer at Xerox PARC.The plan of the paper is this: We start with a generaldiscussion of compilation and delayed evaluation. Then we motivatethe structures and algorithms we need to introduce by showing howto handle a wider and wider class of the primitive APL operators.We discuss various ways of tailoring an evaluator for a particularexpression. Some of this tailoring is possible based only on theexpression itself, while other optimizations require knowledge ofthe (sizes of) the atom bindings in the expression. The readershould always be alert to the kind of knowledge being used, forthis affects the validity of the compiled code across reexecutionsof a statement.