Proceedings of an ACM conference on Language design for reliable software
POPL '76 Proceedings of the 3rd ACM SIGACT-SIGPLAN symposium on Principles on programming languages
An approach to computer language design
An approach to computer language design
Simula Begin
A Directly Executable Encoding for APL
ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiling APL: the Yorktown APL translator
IBM Journal of Research and Development
Efficient interpretation of synchronizable series expressions
SIGPLAN '87 Papers of the Symposium on Interpreters and interpretive techniques
A functional programming language compiler for massively parallel computers
LFP '90 Proceedings of the 1990 ACM conference on LISP and functional programming
ACORN: APL to C on real numbers
APL '90 Conference proceedings on APL 90: for the future
Automatic transformation of series expressions into loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
Size and access inference for data-parallel programs
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
On performance and space usage improvements for parallelized compiled APL code
APL '91 Proceedings of the international conference on APL '91
PODS '93 Proceedings of the twelfth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Compiling nested data-parallel programs for shared-memory multiprocessors
ACM Transactions on Programming Languages and Systems (TOPLAS)
Optimizing dynamically-dispatched calls with run-time type feedback
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
An array operation synthesis scheme to optimize Fortran 90 programs
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Proceedings of the tenth annual conference on Object-oriented programming systems, languages, and applications
Reconciling responsiveness with performance in pure object-oriented languages
ACM Transactions on Programming Languages and Systems (TOPLAS)
Techniques for the translation of MATLAB programs into Fortran 90
ACM Transactions on Programming Languages and Systems (TOPLAS)
ACM Transactions on Programming Languages and Systems (TOPLAS)
An APL Compiler for a Vector Processor
ACM Transactions on Programming Languages and Systems (TOPLAS)
A language for bitmap manipulation
ACM Transactions on Graphics (TOG)
A real-time garbage collector based on the lifetimes of objects
Communications of the ACM
Space-time trade-off optimization for a class of electronic structure calculations
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Carrier arrays: an idiom-preserving extension to APL
POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Paging as a "language processing" task
POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Journal of Intelligent Information Systems
The Classification, Fusion, and Parallelization of Array Language Primitives
IEEE Transactions on Parallel and Distributed Systems
Segmented Alignment: An Enhanced Model to Align Data Parallel Programs of HPF
The Journal of Supercomputing
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
On Materializations of Array-Valued Temporaries
LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Query processing techniques for arrays
The VLDB Journal — The International Journal on Very Large Data Bases
Static array storage optimization in MATLAB
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
POPL '84 Proceedings of the 11th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
An APL compiler for the UNIX timesharing system
APL '83 Proceedings of the international conference on APL
LFP '84 Proceedings of the 1984 ACM Symposium on LISP and functional programming
Rewriting systems on FP expressions that reduce the number of sequences they yield
LFP '84 Proceedings of the 1984 ACM Symposium on LISP and functional programming
Tentative compilation: A design for an APL compiler
APL '79 Proceedings of the international conference on APL: part 1
APL '79 Proceedings of the international conference on APL: part 1
Applicative style programming, program transformation, and list operators
FPCA '81 Proceedings of the 1981 conference on Functional programming languages and computer architecture
Representation of factual information by equations and their evaluation
ICSE '82 Proceedings of the 6th international conference on Software engineering
Anticipatory Optimization in Domain Specific Translation
ICSR '98 Proceedings of the 5th International Conference on Software Reuse
A New Architecture for Transformation-Based Generators
IEEE Transactions on Software Engineering
Optimizing aggregate array computations in loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
ACM-SE 33 Proceedings of the 33rd annual on Southeast regional conference
Proceedings of the 2006 workshop on ML
Accelerator: using data parallelism to program GPUs for general-purpose uses
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
On minimizing materializations of array-valued temporaries
ACM Transactions on Programming Languages and Systems (TOPLAS)
A translator system for the MATLAB language: Research Articles
Software—Practice & Experience
A method for automatically analyzing programs
IJCAI'79 Proceedings of the 6th international joint conference on Artificial intelligence - Volume 2
A knowledge based program editor
IJCAI'81 Proceedings of the 7th international joint conference on Artificial intelligence - Volume 2
Memory minimization for tensor contractions using integer linear programming
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A lazy, self-optimising parallel matrix library
FP'95 Proceedings of the 1995 international conference on Functional Programming
Riposte: a trace-driven compiler and parallel VM for vector code in R
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Hi-index | 0.02 |
Most existing APL implementations are interpretive in nature,that is, each time an APL statement is encountered it is executedby a body of code that is perfectly general, i.e. capable ofevaluating any APL expression, and is in no way tailored to thestatement on hand. This costly generality is said to be justifiedbecause APL variables are typeless and thus can vary arbitrarily intype, shape, and size during the execution of a program. What thisargument overlooks is that the operational semantics of an APLstatement are not modified by the varying storage requirements ofits variables.The first proposal for a non fully interpretive implementationwas the thesis of P. Abrams [1], in which a high level interpretercan defer performing certain operations by compiling code which alow level interpreter must later be called upon to execute. Thebenefit thus gained is that intelligence gathered from a widercontext can be brought to bear on the evaluation of asubexpression. Thus on evaluating (A+B)[I],only the addition A[I]+B[I] will beperformed. More recently, A. Perlis and several of his students atYale [9,10] have presented a scheme by which a full-fledged APLcompiler can be written. The compiled code generated can then bevery efficiently executed on a specialized hardware processor. Asimilar scheme is used in the newly released HP/3000 APL [12].This paper builds on and extends the above ideas in severaldirections. We start by studying in some depth the two key notionsall this work has in common, namely compilation anddelayed evaluation in the context of APL. By delayedevaluation we mean the strategy of deferring the computation ofintermediate results until the moment they are needed. Thus largeintermediate expressions are not built in storage; instead theirelements are "streamed" in time. Delayed evaluation for APL wasprobably first proposed by Barton (see [8]).Many APL operators do not correspond to any real dataoperations. Instead their effect is to rename the elements of thearray they act upon. A wide class of such operators, which we willcall the grid selectors, can be handled by essentiallypushing them down the expression tree and incorporating theireffect into the leaf accessors. Semantically this is equivalent tothe drag-along transformations described by Abrams.Performing this optimization will be shown to be an integral partof delayed evaluation.In order to focus our attention on the above issues, we make anumber of simplifying assumptions. We confine our attention to codecompilation for single APL expressions, such as might occur in an"APL Calculator", where user defined functions are not allowed. Ofcourse we will be critically concerned with the re-usability of thecompiled code for future evaluations. We also ignore thedistinctions among the various APL primitive types and assume thatall our arrays are of one uniform numeric type. We have studied thesituation without these simplifying assumptions, but plan to reporton this elsewhere.The following is a list of the main contributions of thispaper." We present an algorithm for incorporating the selectoroperators into the accessors for the leaves of the expression tree.The algorithm runs in time proportional to the size of the tree, asopposed to its path length (which is the case for the algorithms of[10] and [12]).Although arbitrary reshapes cannot be handled by the abovealgorithm, an especially important case can: that of aconforming reshape. The reshape AñB iscalled conforming if ñB is a suffix of A." By using conforming reshapes we can eliminate inner and outerproducts from the expression tree and replace them with scalaroperators and reductions along the last dimension. We do this byintroducing appropriate selectors on the product arguments, theneventually absorbing these selectors into the leaf accessors. Thesame mechanism handles scalar extension, the convention ofmaking scalar operands of scalar operators conform to arbitraryarrays." Once products, scalar extensions, and selectors have beeneliminated, what is left is an expression tree consisting entirelyof scalar operators and reductions along the last dimension. As aconsequence, during execution, the dimension currently being workedon obeys a strict stack-like discipline. This implies that we cangenerate extremely efficient code that is independent of theranks of the arguments.Several APL operators use the elements of their operands severaltimes. A pure delayed evaluation strategy would require multiplereevaluations." We introduce a general buffering mechanism, calledslicing, which allows portions of a subexpression that willbe repeatedly needed to be saved, to avoid future recomputation.Slicing is well integrated with the evaluation on demand mechanism.For example, when operators that break the streaming areencountered, slicing is used to determine the minimum size bufferrequired between the order in which a subexpression can deliver itsresult, and the order in which the full expression needs it." The compiled code is very efficient. A minimal number of loopvariables is maintained and accessors are shared among as manyexpression atoms as possible. Finally, the code generated is wellsuited for execution by an ordinary minicomputer, such as a PDP-11,or a Data General Nova. We have implemented this compiler on theAlto computer at Xerox PARC.The plan of the paper is this: We start with a generaldiscussion of compilation and delayed evaluation. Then we motivatethe structures and algorithms we need to introduce by showing howto handle a wider and wider class of the primitive APL operators.We discuss various ways of tailoring an evaluator for a particularexpression. Some of this tailoring is possible based only on theexpression itself, while other optimizations require knowledge ofthe (sizes of) the atom bindings in the expression. The readershould always be alert to the kind of knowledge being used, forthis affects the validity of the compiled code across reexecutionsof a statement.