On the role of tests in test-driven development: a differentiated and partial replication

  • Authors:
  • Davide Fucci;Burak Turhan

  • Affiliations:
  • Department of Information Processing Science, University of Oulu, Oulu, Finland;Department of Information Processing Science, University of Oulu, Oulu, Finland

  • Venue:
  • Empirical Software Engineering
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

Background: Test-Driven Development (TDD) is claimed to have positive effects on external code quality and programmers' productivity. The main driver for these possible improvements is the tests enforced by the test-first nature of TDD as previously investigated in a controlled experiment (i.e. the original study). Aim: Our goal is to examine the nature of the relationship between tests and external code quality as well as programmers' productivity in order to verify/ refute the results of the original study. Method: We conducted a differentiated and partial replication of the original setting and the related analyses, with a focus on the role of tests. Specifically, while the original study compared test-first vs. test-last, our replication employed the test-first treatment only. The replication involved 30 students, working in pairs or as individuals, in the context of a graduate course, and resulted in 16 software artifacts developed. We performed linear regression to test the original study's hypotheses, and analyses of covariance to test the additional hypotheses imposed by the changes in the replication settings. Results: We found significant correlation (Spearman coefficient = 0.66, with p-value = 0.004) between the number of tests and productivity, and a positive regression coefficient (p-value = 0.011). We found no significant correlation (Spearman coefficient = 0.41 with p-value = 0.11) between the number of tests and external code quality (regression coefficient p-value = 0.0513). For both cases we observed no statistically significant interaction caused by the subject units being individuals or pairs. Further, our results are consistent with the original study although there were changes in the timing constraints for finishing the task and the enforced development processes. Conclusions: This replication study confirms the results of the original study concerning the relationship between the number of tests vs. external code quality and programmer productivity. Moreover, this replication allows us to identify additional context variables, for which the original results still hold; namely the subject unit, timing constraint and isolation of test-first process. Based on our findings, we recommend practitioners to implement as many tests as possible in order to achieve higher baselines for quality and productivity.