Representation-based just-in-time specialization and the psyco prototype for python
Proceedings of the 2004 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation
Taverna: lessons in creating a workflow environment for the life sciences: Research Articles
Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
Scientific workflow management and the Kepler system: Research Articles
Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Provenance-aware storage systems
ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
Tracing lineage beyond relational operators
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Querying and re-using workflows with VsTrails
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
CEAL: a C-based language for self-adjusting computation
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Linux kernel developer responses to static analysis bug reports
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Purity and side effect analysis for java programs
VMCAI'05 Proceedings of the 6th international conference on Verification, Model Checking, and Abstract Interpretation
Transactional consistency and automatic management in an application data cache
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Using automatic persistent memoization to facilitate data analysis scripting
Proceedings of the 2011 International Symposium on Software Testing and Analysis
Two for the price of one: a model for parallel and incremental computation
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Towards automated collection of application-level data provenance
TaPP'12 Proceedings of the 4th USENIX conference on Theory and Practice of Provenance
Hi-index | 0.00 |
Computational scientists often prototype data analysis scripts using high-level languages like Python. To speed up execution times, they manually refactor their scripts into stages (separate functions) and write extra code to save intermediate results to disk in order to avoid recomputing them in subsequent runs. To eliminate this burden, we enhanced the Python interpreter to automatically memoize (save) the results of long-running function executions to disk, manage dependencies between code edits and saved results, and re-use memoized results rather than re-executing those functions when guaranteed safe to do so. There is a ∼20% run-time slowdown during the initial run, but subsequent runs can speed up by several orders of magnitude. Using our enhanced interpreter, scientists can write simple and maintainable code that also runs fast after minor edits, without having to learn any new programming languages or constructs.