Comparing the Effectiveness of Software Testing Strategies
IEEE Transactions on Software Engineering
A summary of software measurement experiences in the software engineering laboratory
Journal of Systems and Software
Finding usability problems through heuristic evaluation
CHI '92 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Comparing and combining software defect detection techniques: a replicated empirical study
ESEC '97/FSE-5 Proceedings of the 6th European SOFTWARE ENGINEERING conference held jointly with the 5th ACM SIGSOFT international symposium on Foundations of software engineering
Experimentation in software engineering: an introduction
Experimentation in software engineering: an introduction
A Comprehensive Evaluation of Capture-Recapture Models for Estimating Software Defect Content
IEEE Transactions on Software Engineering
A controlled experiment in program testing and code walkthroughs/inspections
Communications of the ACM
Information Retrieval
Modern Information Retrieval
Empirical Software Engineering
Hints for Reviewing Empirical Work in Software Engineering
Empirical Software Engineering
Software defect-removal efficiency
Computer
An Empirical Evaluation of Three Defect-Detection Techniques
Proceedings of the 5th European Software Engineering Conference
The Effects of Time Pressure on Quality in Software Development: An Agency Model
Information Systems Research
Investigating the Defect Detection Effectiveness and Cost Benefit of Nominal Inspection Teams
IEEE Transactions on Software Engineering
The Development and Evaluation of Three Diverse Techniques for Object-Oriented Code Inspection
IEEE Transactions on Software Engineering
An Experimental Evaluation of Inspection and Testing for Detection of Design Faults
ISESE '03 Proceedings of the 2003 International Symposium on Empirical Software Engineering
Reviewing 25 Years of Testing Technique Experiments
Empirical Software Engineering
A Cognitive-Based Mechanism for Constructing Software Inspection Teams
IEEE Transactions on Software Engineering
Observations and lessons learned from automated testing
Proceedings of the 27th international conference on Software engineering
International Journal of Human-Computer Studies
Perspective-Based Reading: A Replicated Experiment Focused on Individual Reviewer Effectiveness
Empirical Software Engineering
Detection of Duplicate Defect Reports Using Natural Language Processing
ICSE '07 Proceedings of the 29th international conference on Software Engineering
ICSE '07 Proceedings of the 29th international conference on Software Engineering
Software Testing Research: Achievements, Challenges, Dreams
FOSE '07 2007 Future of Software Engineering
A Systematic Review of Theory Use in Software Engineering Experiments
IEEE Transactions on Software Engineering
Software Effort, Quality, and Cycle Time: A Study of CMM Level 5 Projects
IEEE Transactions on Software Engineering
A Critical Analysis of Empirical Research in Software Testing
ESEM '07 Proceedings of the First International Symposium on Empirical Software Engineering and Measurement
Defect Detection Efficiency: Test Case Based vs. Exploratory Testing
ESEM '07 Proceedings of the First International Symposium on Empirical Software Engineering and Measurement
Requirement Error Abstraction and Classification: A Control Group Replicated Study
ISSRE '07 Proceedings of the The 18th IEEE International Symposium on Software Reliability
The effect of the number of inspectors on the defect estimates produced by capture-recapture models
Proceedings of the 30th international conference on Software engineering
Software Testing: Principles and Practices
Software Testing: Principles and Practices
International Journal of Human-Computer Studies
The Impact of Design and Code Reviews on Software Quality: An Empirical Study Based on PSP Data
IEEE Transactions on Software Engineering
Debugging in the (very) large: ten years of implementation and experience
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Impact of Budget and Schedule Pressure on Software Development Cycle Time and Effort
IEEE Transactions on Software Engineering
The usability inspection performance of work-domain experts: An empirical study
Interacting with Computers
Summarizing software artifacts: a case study of bug reports
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
A human study of fault localization accuracy
ICSM '10 Proceedings of the 2010 IEEE International Conference on Software Maintenance
IEEE Transactions on Software Engineering
Analysis of Mistakes as a Method to Improve Test Case Design
ICST '11 Proceedings of the 2011 Fourth IEEE International Conference on Software Testing, Verification and Validation
Who tested my software? Testing as an organizationally cross-cutting activity
Software Quality Control
Reducing test effort: A systematic mapping study on existing approaches
Information and Software Technology
How many individuals to use in a QA task with fixed total effort?
Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement
Empirical Software Engineering
Hi-index | 0.00 |
Context: The questions of how many individuals and how much time to use for a single testing task are critical in software verification and validation. In software review and usability evaluation contexts, positive effects of using multiple individuals for a task have been found, but software testing has not been studied from this viewpoint. Objective: We study how adding individuals and imposing time pressure affects the effectiveness and efficiency of manual testing tasks. We applied the group productivity theory from social psychology to characterize the type of software testing tasks. Method: We conducted an experiment where 130 students performed manual testing under two conditions, one with a time restriction and pressure, i.e., a 2-h fixed slot, and another where the individuals could use as much time as they needed. Results: We found evidence that manual software testing is an additive task with a ceiling effect, like software reviews and usability inspections. Our results show that a crowd of five time-restricted testers using 10h in total detected 71% more defects than a single non-time-restricted tester using 9.9h. Furthermore, we use F-score measure from the information retrieval domain to analyze the optimal number of testers in terms of both effectiveness and validity of testing results. We suggest that future studies on verification and validation practices use F-score to provide a more transparent view of the results. Conclusions: The results seem promising for the time-pressured crowds by indicating that multiple time-pressured individuals deliver superior defect detection effectiveness in comparison to non-time-pressured individuals. However, caution is needed, as the limitations of this study need to be addressed in future works. Finally, we suggest that the size of the crowd used in software testing tasks should be determined based on the share of duplicate and invalid reports produced by the crowd and by the effectiveness of the duplicate handling mechanisms.