Monoids for rapid data flow analysis

  • Authors:
  • Barry K. Rosen

  • Affiliations:
  • IBM Thomas J. Watson Research Center, Yorktown Heights, New York

  • Venue:
  • POPL '78 Proceedings of the 5th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
  • Year:
  • 1978

Quantified Score

Hi-index 0.00

Visualization

Abstract

The earliest data flow analysis research dealt with concreteproblems (such as detection of available expressions) and with lowlevel representations of control flow (with one large graph, eachof whose nodes represents a basic block). Several recent papershave introduced an abstract approach, dealing with any problemexpressible in terms of a semilattice L and a monoid M of isotonemaps from L to L, under various algebraic constraints. Examplesinclude [CC77; GW76; KU76; Ki73; Ta75; Ta76; We75]. Several otherrecent papers have introduced a high level representation with manysmall graphs, each of which represents a small portion of thecontrol flow information in a program. The hierarchy of smallgraphs is explicit in [Ro77a; Ro77b] and implicit in papers thatdeal with syntax directed analysis of programs written within theconfines of classical structured programming [DDH72, Sec. 1.7].Examples include [TK76; ZB74]. The abstract papers have retainedthe low level representations while the high level papers haveretained the concrete problems of the earliest work. This paperstudies abstract conditions on L and M that lead to rapid data flowanalysis, with emphasis on high level representations. Unlike someanalysis methods oriented toward structured programming [TK76;Wu75; ZB74], our method retains the ability to cope with arbitraryescape and jump statements while it exploits the control flowinformation implicit in the parse tree.The general algebraic framework for data flow analysis withsemilattices is presented in Section 2, along with some preliminarylemmas. Our "rapid" monoids properly include the "fast" monoids of[GW76]. Section 3 relates data flow problems to the hierarchies ofsmall graphs introduced in [Ro77a; Ro77b]. High level analysisbegins with local information expressed by mapping the arcs of alarge graph into the monoid M, much as in low level analysis. Buteach arc in our small graphs represents a set (often an infiniteset) of paths in the underlying large graph. Appropriate members ofM are associated with these arcs. This "globalized" localinformation is used to solve global flow problems in Section 4. Thefundamental theorem of Section 4 is applied to programs with thecontrol structures of classical structured programming in Section5. For a given rapid monoid M, the time required to solve anyglobal data flow problem is linear in the number of statements inthe program. (For varying M, the time is linear in the product ofthis number by t@, where t@ is a parameter ofM introduced in the definition of rapidity.) For reasons sketchedat the beginning of Section 6, we feel obliged to cope with sourcelevel escape and jump statements as well as with classicalstructured programming. Section 6 shows how to apply thefundamental theorem of Section 4 to programs with arbitrary escapesand jumps. The explicit time bound is only derived for programswithout jumps. A comparison between the results obtained by ourmethod and those obtained by [GW76] is in Section 7, which alsocontains examples of rapid monoids in the full paper. Finally,Section 8 lists conclusions and open problems. Proofs of lemmas areomitted to save space. The full paper will resubmitted to ajournal.We proceed from the general to the particular, except in someplaces where bending the rule a little makes a significantimprovement in the expository flow. Common mathematical notation isused. To avoid excessive parentheses, the value of a function f atan argument x is fx rather than f(x). If fx is itself a functionthen (fx)y is the result of applying fx to y. The usual¡Ü and ¡Ý symbols are used for arbitrarypartial orders as well as for the usual order among integers. Afunction from a partially ordered set (poset) to a poset isisotone iff x ¡Ü y implies fx ¡Ü fy.(Isotone maps are sometimes called "monotonic" in the literature.)A meet semilattice is a poset with a binary operation¡Ä such that x ¡Ä y is the greatest lowerbound of the set {x, y}. A meet semilattice wherein every subsethas a greatest lower bound is complete. In particular, theempty subset has a greatest lower bound T, so a complete meetsemilattice has a maximum element. A monoid is a settogether with an associative binary operation ∘ that hasa unit element 1 : 1 ∘ m = m ∘1 = m for all m. In all our examples the monoid M will be amonoid of functions: every member of M is a function (from aset into itself), the operation ∘ is the usualcomposition (f ∘ g)x = f(gx), and the unit 1 isthe identity function with 1X = x for all x. Twoconsiderations governed the notational choices. First, we speak inways that are common in mathematics and are convenient here.Second, we try to facilitate comparisons with [GW76; KU76; Ro77b],to the extent that the disparities among these works permit. Onedisparity is between the meet semilattices of [GW76; KU76; Ki73]and the join semilattices of [Ro77b; Ta75; We75], whereleast upper bounds are considered instead of greatest lower bounds.To speak of meets is more natural in applications that areintuitively stated in terms of "what must happen on all paths" insome class of paths in a program, while to speak of joins is morenatural in applications that are intuitively stated in terms of"what can happen on some paths." By checking whether there are anypaths in the relevant class and by using the rule that 3 isequivalent to ⌍V⌍, join oriented applicationscan be reduced to meet oriented ones (and vice versa). A generaltheory should speak in one way or the other, and we have chosenmeets. For us, strong assertions about a program's data flow arehigh in the semilattice.