More testers - The effect of crowd size and time restriction in software testing

Authors:
Mika V. Mäntylä;Juha Itkonen
Affiliations:
Department of Computer Science and Engineering, Aalto University, Finland and Department of Computer Science, Lund University, Sweden;Department of Computer Science and Engineering, Aalto University, Finland
Venue:
Information and Software Technology
Year:
2013

Citing 45
Cited 1

Comparing the Effectiveness of Software Testing Strategies

IEEE Transactions on Software Engineering
A summary of software measurement experiences in the software engineering laboratory

Journal of Systems and Software
Finding usability problems through heuristic evaluation

CHI '92 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Comparing and combining software defect detection techniques: a replicated empirical study

ESEC '97/FSE-5 Proceedings of the 6th European SOFTWARE ENGINEERING conference held jointly with the 5th ACM SIGSOFT international symposium on Foundations of software engineering
Experimentation in software engineering: an introduction

Experimentation in software engineering: an introduction
A Comprehensive Evaluation of Capture-Recapture Models for Estimating Software Defect Content

IEEE Transactions on Software Engineering
A controlled experiment in program testing and code walkthroughs/inspections

Communications of the ACM
Information Retrieval

Information Retrieval
Modern Information Retrieval

Modern Information Retrieval
Using Students as Subjects—A Comparative Study ofStudents and Professionals in Lead-Time Impact Assessment

Empirical Software Engineering
Hints for Reviewing Empirical Work in Software Engineering

Empirical Software Engineering
Software defect-removal efficiency

Computer
An Empirical Evaluation of Three Defect-Detection Techniques

Proceedings of the 5th European Software Engineering Conference
The Effects of Time Pressure on Quality in Software Development: An Agency Model

Information Systems Research
Investigating the Defect Detection Effectiveness and Cost Benefit of Nominal Inspection Teams

IEEE Transactions on Software Engineering
The Development and Evaluation of Three Diverse Techniques for Object-Oriented Code Inspection

IEEE Transactions on Software Engineering
An Experimental Evaluation of Inspection and Testing for Detection of Design Faults

ISESE '03 Proceedings of the 2003 International Symposium on Empirical Software Engineering
Reviewing 25 Years of Testing Technique Experiments

Empirical Software Engineering
A Cognitive-Based Mechanism for Constructing Software Inspection Teams

IEEE Transactions on Software Engineering
Observations and lessons learned from automated testing

Proceedings of the 27th international conference on Software engineering
The effects of task complexity and time availability limitations on human performance in database query tasks

International Journal of Human-Computer Studies
Perspective-Based Reading: A Replicated Experiment Focused on Individual Reviewer Effectiveness

Empirical Software Engineering
Detection of Duplicate Defect Reports Using Natural Language Processing

ICSE '07 Proceedings of the 29th international conference on Software Engineering
'Good' Organisational Reasons for 'Bad' Software Testing: An Ethnographic Study of Testing in a Small Software Company

ICSE '07 Proceedings of the 29th international conference on Software Engineering
Software Testing Research: Achievements, Challenges, Dreams

FOSE '07 2007 Future of Software Engineering
A Systematic Review of Theory Use in Software Engineering Experiments

IEEE Transactions on Software Engineering
Software Effort, Quality, and Cycle Time: A Study of CMM Level 5 Projects

IEEE Transactions on Software Engineering
A Critical Analysis of Empirical Research in Software Testing

ESEM '07 Proceedings of the First International Symposium on Empirical Software Engineering and Measurement
Defect Detection Efficiency: Test Case Based vs. Exploratory Testing

ESEM '07 Proceedings of the First International Symposium on Empirical Software Engineering and Measurement
Requirement Error Abstraction and Classification: A Control Group Replicated Study

ISSRE '07 Proceedings of the The 18th IEEE International Symposium on Software Reliability
The effect of the number of inspectors on the defect estimates produced by capture-recapture models

Proceedings of the 30th international conference on Software engineering
Software Testing: Principles and Practices

Software Testing: Principles and Practices
Effect of evaluators' cognitive style on heuristic evaluation: Field dependent and field independent evaluators

International Journal of Human-Computer Studies
The Impact of Design and Code Reviews on Software Quality: An Empirical Study Based on PSP Data

IEEE Transactions on Software Engineering
Debugging in the (very) large: ten years of implementation and experience

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Impact of Budget and Schedule Pressure on Software Development Cycle Time and Effort

IEEE Transactions on Software Engineering
The usability inspection performance of work-domain experts: An empirical study

Interacting with Computers
Summarizing software artifacts: a case study of bug reports

Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
A human study of fault localization accuracy

ICSM '10 Proceedings of the 2010 IEEE International Conference on Software Maintenance
Assessing, Comparing, and Combining State Machine-Based Testing and Structural Testing: A Series of Experiments

IEEE Transactions on Software Engineering
Analysis of Mistakes as a Method to Improve Test Case Design

ICST '11 Proceedings of the 2011 Fourth IEEE International Conference on Software Testing, Verification and Validation
Are two heads better than one for software development? the productivity paradox of pair programming

MIS Quarterly
Who tested my software? Testing as an organizationally cross-cutting activity

Software Quality Control
Reducing test effort: A systematic mapping study on existing approaches

Information and Software Technology
How many individuals to use in a QA task with fixed total effort?

Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement

Are test cases needed? Replicated comparison between exploratory and test-case-based software testing

Empirical Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Context: The questions of how many individuals and how much time to use for a single testing task are critical in software verification and validation. In software review and usability evaluation contexts, positive effects of using multiple individuals for a task have been found, but software testing has not been studied from this viewpoint. Objective: We study how adding individuals and imposing time pressure affects the effectiveness and efficiency of manual testing tasks. We applied the group productivity theory from social psychology to characterize the type of software testing tasks. Method: We conducted an experiment where 130 students performed manual testing under two conditions, one with a time restriction and pressure, i.e., a 2-h fixed slot, and another where the individuals could use as much time as they needed. Results: We found evidence that manual software testing is an additive task with a ceiling effect, like software reviews and usability inspections. Our results show that a crowd of five time-restricted testers using 10h in total detected 71% more defects than a single non-time-restricted tester using 9.9h. Furthermore, we use F-score measure from the information retrieval domain to analyze the optimal number of testers in terms of both effectiveness and validity of testing results. We suggest that future studies on verification and validation practices use F-score to provide a more transparent view of the results. Conclusions: The results seem promising for the time-pressured crowds by indicating that multiple time-pressured individuals deliver superior defect detection effectiveness in comparison to non-time-pressured individuals. However, caution is needed, as the limitations of this study need to be addressed in future works. Finally, we suggest that the size of the crowd used in software testing tasks should be determined based on the share of duplicate and invalid reports produced by the crowd and by the effectiveness of the duplicate handling mechanisms.