Learning semantic string transformations from examples

  • Authors:
  • Rishabh Singh;Sumit Gulwani

  • Affiliations:
  • MIT CSAIL, Cambridge, MA;Microsoft Research, Redmond, WA

  • Venue:
  • Proceedings of the VLDB Endowment
  • Year:
  • 2012

Quantified Score

Hi-index 0.02

Visualization

Abstract

We address the problem of performing semantic transformations on strings, which may represent a variety of data types (or their combination) such as a column in a relational table, time, date, currency, etc. Unlike syntactic transformations, which are based on regular expressions and which interpret a string as a sequence of characters, semantic transformations additionally require exploiting the semantics of the data type represented by the string, which may be encoded as a database of relational tables. Manually performing such transformations on a large collection of strings is error prone and cumbersome, while programmatic solutions are beyond the skill-set of end-users. We present a programming by example technology that allows end-users to automate such repetitive tasks. We describe an expressive transformation language for semantic manipulation that combines table lookup operations and syntactic manipulations. We then present a synthesis algorithm that can learn all transformations in the language that are consistent with the user-provided set of input-output examples. We have implemented this technology as an add-in for the Microsoft Excel Spreadsheet system and have evaluated it successfully over several benchmarks picked from various Excel help-forums.