Typing ad hoc data

  • Authors:
  • Kathleen Fisher

  • Affiliations:
  • AT&T Labs Research

  • Venue:
  • TLDI '07 Proceedings of the 2007 ACM SIGPLAN international workshop on Types in languages design and implementation
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Traditionally, types describe the internal data manipulated by programs. To accommodate the variety of desired data structures, language designers and type theorists have developed a wide variety of types and type constructors. But not all useful data is in programs; in fact, enormous amounts of it sit on disks or stream by on wires in a dizzying array of encodings and formats. It turns out that many of the types developed for internal data can be used to describe external data: tuples, records, unions, options, and lists come to mind as obvious examples. Perhaps more surprisingly, recursive types, singletons, functions, parametric polymorphism, and dependent types are relevant as well. Using types to describe external data leads naturally to the insight that we can reuse the same type to define an internal data structure and to generate parsing and printing functions to map between the two representations. The PADS project [1] has exploited this idea, building data description languages based on the type structure of C (PADS/C [3] and on ML (PADS/ML [5] and exploring the theoretical basis for such languages with the Data Description Calculus (DDC) [4]. Other groups have also leveraged this insight, most closely the work on DataScript [2] and PacketTypes [6]. Continuing the analogy, it turns out that other concepts from the types world are also relevant to ad hoc data processing, including generic programming, type inference, type isomorphisms, and subtyping.In this talk, I will describe the domain of ad hoc data processing and explain how types enable precise descriptions of such data. I will then explore the question of type inference, describing quantitative techniques we are currently developing to construct a description of ad hoc data given example instances.