Compilation and delayed evaluation in APL

  • Authors:
  • Leo J. Guibas;Douglas K. Wyatt

  • Affiliations:
  • Xerox Palo Alto Research Center, Palo Alto, Cal.;Xerox Palo Alto Research Center, Palo Alto, Cal.

  • Venue:
  • POPL '78 Proceedings of the 5th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
  • Year:
  • 1978

Quantified Score

Hi-index 0.02

Visualization

Abstract

Most existing APL implementations are interpretive in nature,that is, each time an APL statement is encountered it is executedby a body of code that is perfectly general, i.e. capable ofevaluating any APL expression, and is in no way tailored to thestatement on hand. This costly generality is said to be justifiedbecause APL variables are typeless and thus can vary arbitrarily intype, shape, and size during the execution of a program. What thisargument overlooks is that the operational semantics of an APLstatement are not modified by the varying storage requirements ofits variables.The first proposal for a non fully interpretive implementationwas the thesis of P. Abrams [1], in which a high level interpretercan defer performing certain operations by compiling code which alow level interpreter must later be called upon to execute. Thebenefit thus gained is that intelligence gathered from a widercontext can be brought to bear on the evaluation of asubexpression. Thus on evaluating (A+B)[I],only the addition A[I]+B[I] will beperformed. More recently, A. Perlis and several of his students atYale [9,10] have presented a scheme by which a full-fledged APLcompiler can be written. The compiled code generated can then bevery efficiently executed on a specialized hardware processor. Asimilar scheme is used in the newly released HP/3000 APL [12].This paper builds on and extends the above ideas in severaldirections. We start by studying in some depth the two key notionsall this work has in common, namely compilation anddelayed evaluation in the context of APL. By delayedevaluation we mean the strategy of deferring the computation ofintermediate results until the moment they are needed. Thus largeintermediate expressions are not built in storage; instead theirelements are "streamed" in time. Delayed evaluation for APL wasprobably first proposed by Barton (see [8]).Many APL operators do not correspond to any real dataoperations. Instead their effect is to rename the elements of thearray they act upon. A wide class of such operators, which we willcall the grid selectors, can be handled by essentiallypushing them down the expression tree and incorporating theireffect into the leaf accessors. Semantically this is equivalent tothe drag-along transformations described by Abrams.Performing this optimization will be shown to be an integral partof delayed evaluation.In order to focus our attention on the above issues, we make anumber of simplifying assumptions. We confine our attention to codecompilation for single APL expressions, such as might occur in an"APL Calculator", where user defined functions are not allowed. Ofcourse we will be critically concerned with the re-usability of thecompiled code for future evaluations. We also ignore thedistinctions among the various APL primitive types and assume thatall our arrays are of one uniform numeric type. We have studied thesituation without these simplifying assumptions, but plan to reporton this elsewhere.The following is a list of the main contributions of thispaper." We present an algorithm for incorporating the selectoroperators into the accessors for the leaves of the expression tree.The algorithm runs in time proportional to the size of the tree, asopposed to its path length (which is the case for the algorithms of[10] and [12]).Although arbitrary reshapes cannot be handled by the abovealgorithm, an especially important case can: that of aconforming reshape. The reshape AñB iscalled conforming if ñB is a suffix of A." By using conforming reshapes we can eliminate inner and outerproducts from the expression tree and replace them with scalaroperators and reductions along the last dimension. We do this byintroducing appropriate selectors on the product arguments, theneventually absorbing these selectors into the leaf accessors. Thesame mechanism handles scalar extension, the convention ofmaking scalar operands of scalar operators conform to arbitraryarrays." Once products, scalar extensions, and selectors have beeneliminated, what is left is an expression tree consisting entirelyof scalar operators and reductions along the last dimension. As aconsequence, during execution, the dimension currently being workedon obeys a strict stack-like discipline. This implies that we cangenerate extremely efficient code that is independent of theranks of the arguments.Several APL operators use the elements of their operands severaltimes. A pure delayed evaluation strategy would require multiplereevaluations." We introduce a general buffering mechanism, calledslicing, which allows portions of a subexpression that willbe repeatedly needed to be saved, to avoid future recomputation.Slicing is well integrated with the evaluation on demand mechanism.For example, when operators that break the streaming areencountered, slicing is used to determine the minimum size bufferrequired between the order in which a subexpression can deliver itsresult, and the order in which the full expression needs it." The compiled code is very efficient. A minimal number of loopvariables is maintained and accessors are shared among as manyexpression atoms as possible. Finally, the code generated is wellsuited for execution by an ordinary minicomputer, such as a PDP-11,or a Data General Nova. We have implemented this compiler on theAlto computer at Xerox PARC.The plan of the paper is this: We start with a generaldiscussion of compilation and delayed evaluation. Then we motivatethe structures and algorithms we need to introduce by showing howto handle a wider and wider class of the primitive APL operators.We discuss various ways of tailoring an evaluator for a particularexpression. Some of this tailoring is possible based only on theexpression itself, while other optimizations require knowledge ofthe (sizes of) the atom bindings in the expression. The readershould always be alert to the kind of knowledge being used, forthis affects the validity of the compiled code across reexecutionsof a statement.