New ideas track: testing mapreduce-style programs

Authors:
Christoph Csallner;Leonidas Fegaras;Chengkai Li
Affiliations:
University of Texas at Arlington, Arlington, TX, USA;University of Texas at Arlington, Arlington, TX, USA;University of Texas at Arlington, Arlington, TX, USA
Venue:
Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
Year:
2011

Citing 14
Cited 0

DART: directed automated random testing

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Interpreting the data: Parallel analysis with Sawzall

Scientific Programming - Dynamic Grids and Worldwide Computing
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Automatic program transformation with JOIE

ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Generating example data for dataflow programs

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
MapReduce and parallel DBMSs: friends or foes?

Communications of the ACM - Amir Pnueli: Ahead of His Time
Hive: a warehousing solution over a map-reduce framework

Proceedings of the VLDB Endowment
Hadoop: The Definitive Guide

Hadoop: The Definitive Guide
Mochi: visual log-analysis based tools for debugging hadoop

HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
Dsc+Mock: a test case + mock class generator in support of coding against interfaces

Proceedings of the Eighth International Workshop on Dynamic Analysis
Execution generated test cases: how to make systems code crash itself

SPIN'05 Proceedings of the 12th international conference on Model Checking Software
Automated systematic testing of open distributed programs

FASE'06 Proceedings of the 9th international conference on Fundamental Approaches to Software Engineering

Quantified Score

Hi-index	0.01

Visualization

Abstract

MapReduce has become a common programming model for processing very large amounts of data, which is needed in a spectrum of modern computing applications. Today several MapReduce implementations and execution systems exist and many MapReduce programs are being developed and deployed in practice. However, developing MapReduce programs is not always an easy task. The programming model makes programs prone to several MapReduce-specific bugs. That is, to produce deterministic results, a MapReduce program needs to satisfy certain high-level correctness conditions. A violating program may yield different output values on the same input data, based on low-level infrastructure events such as network latency, scheduling decisions, etc. Current MapReduce systems and tools are lacking in support for checking these conditions and reporting violations. This paper presents a novel technique that systematically searches for such bugs in MapReduce applications and generates corresponding test cases. The technique works by encoding the high-level MapReduce correctness conditions as symbolic program constraints and checking them for the program under test. To the best of our knowledge, this is the first approach to addressing this problem of MapReduce-style programming.