Caching function calls using precise dependencies
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
An experimental evaluation of continuous testing during development
ISSTA '04 Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysis
Estimating the Numbers of End Users and End User Programmers
VLHCC '05 Proceedings of the 2005 IEEE Symposium on Visual Languages and Human-Centric Computing
Software Development Environments for Scientific and Engineering Software: A Series of Case Studies
ICSE '07 Proceedings of the 29th international conference on Software Engineering
What Supercomputers Say: A Study of Five System Logs
DSN '07 Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
How to shadow every byte of memory used by a program
Proceedings of the 3rd international conference on Virtual execution environments
Querying and re-using workflows with VsTrails
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Self-adjusting computation: (an overview)
Proceedings of the 2009 ACM SIGPLAN workshop on Partial evaluation and program manipulation
CEAL: a C-based language for self-adjusting computation
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Trace-based just-in-time type specialization for dynamic languages
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Five Recommended Practices for Computational Scientists Who Write Software
IEEE Design & Test
Characterizing and predicting which bugs get fixed: an empirical study of Microsoft Windows
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
TAPP'10 Proceedings of the 2nd conference on Theory and practice of provenance
Linux kernel developer responses to static analysis bug reports
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
The state of the art in end-user software engineering
ACM Computing Surveys (CSUR)
Purity and side effect analysis for java programs
VMCAI'05 Proceedings of the 6th international conference on Verification, Model Checking, and Abstract Interpretation
Metamouse: improving multi-user sharing of existing educational applications
Proceedings of the 4th ACM/IEEE International Conference on Information and Communication Technologies and Development
Efficient patch-based auditing for web application vulnerabilities
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Hi-index | 0.00 |
Programmers across a wide range of disciplines (e.g., bioinformatics, neuroscience, econometrics, finance, data mining, information retrieval, machine learning) write scripts to parse, transform, process, and extract insights from data. To speed up iteration times, they split their analyses into stages and write extra code to save the intermediate results of each stage to files so that those results do not have to be re-computed in every subsequent run. As they explore and refine hypotheses, their scripts often create and process lots of intermediate data files. They need to properly manage the myriad of dependencies between their code and data files, or else their analyses will produce incorrect results. To enable programmers to iterate quickly without needing to manage intermediate data files, we added a set of dynamic analyses to the programming language interpreter so that it automatically memoizes (caches) the results of long-running pure function calls to disk, manages dependencies between code and on-disk data, and later re-uses memoized results, rather than re-executing those functions, when guaranteed safe to do so. We created an implementation for Python and show how it enables programmers to iterate faster on their data analysis scripts while writing less code and not having to manage dependencies between their code and datasets.