zymake: a computational workflow system for machine learning and natural language processing

  • Authors:
  • Eric Breck

  • Affiliations:
  • Cornell University, Ithaca, NY

  • Venue:
  • SETQA-NLP '08 Software Engineering, Testing, and Quality Assurance for Natural Language Processing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Experiments in natural language processing and machine learning typically involve running a complicated network of programs to create, process, and evaluate data. Researchers often write one or more UNIX shell scripts to "glue" together these various pieces, but such scripts are suboptimal for several reasons. Without significant additional work, a script does not handle recovering from failures, it requires keeping track of complicated filenames, and it does not support running processes in parallel. In this paper, we present zymake as a solution to all these problems. zymake scripts look like shell scripts, but have semantics similar to makefiles. Using zymake improves repeatability and scalability of running experiments, and provides a clean, simple interface for assembling components. A zymake script also serves as documentation for the complete workflow. We present a zymake script for a published set of NLP experiments, and demonstrate that it is superior to alternative solutions, including shell scripts and makefiles, while being far simpler to use than scientific grid computing systems.