Towards practical incremental recomputation for scientists: an implementation for the Python language

Authors:
Philip J. Guo;Dawson Engler
Affiliations:
Stanford University;Stanford University
Venue:
TAPP'10 Proceedings of the 2nd conference on Theory and practice of provenance
Year:
2010

Citing 12
Cited 4

Representation-based just-in-time specialization and the psyco prototype for python

Proceedings of the 2004 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation
Taverna: lessons in creating a workflow environment for the life sciences: Research Articles

Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
Scientific workflow management and the Kepler system: Research Articles

Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Provenance-aware storage systems

ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
Tracing lineage beyond relational operators

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Querying and re-using workflows with VsTrails

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
CEAL: a C-based language for self-adjusting computation

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Opportunistic Programming: Writing Code to Prototype, Ideate, and Discover

IEEE Software
Linux kernel developer responses to static analysis bug reports

USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Purity and side effect analysis for java programs

VMCAI'05 Proceedings of the 6th international conference on Verification, Model Checking, and Abstract Interpretation

Transactional consistency and automatic management in an application data cache

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Using automatic persistent memoization to facilitate data analysis scripting

Proceedings of the 2011 International Symposium on Software Testing and Analysis
Two for the price of one: a model for parallel and incremental computation

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Towards automated collection of application-level data provenance

TaPP'12 Proceedings of the 4th USENIX conference on Theory and Practice of Provenance

Quantified Score

Hi-index	0.00

Visualization

Abstract

Computational scientists often prototype data analysis scripts using high-level languages like Python. To speed up execution times, they manually refactor their scripts into stages (separate functions) and write extra code to save intermediate results to disk in order to avoid recomputing them in subsequent runs. To eliminate this burden, we enhanced the Python interpreter to automatically memoize (save) the results of long-running function executions to disk, manage dependencies between code edits and saved results, and re-use memoized results rather than re-executing those functions when guaranteed safe to do so. There is a ∼20% run-time slowdown during the initial run, but subsequent runs can speed up by several orders of magnitude. Using our enhanced interpreter, scientists can write simple and maintainable code that also runs fast after minor edits, without having to learn any new programming languages or constructs.