UNIX test tools and benchmarks: methods and tools to design, develop, and execute functional, structural, reliability, and regression tests
A gimmick to integrate software testing throughout the curriculum
SIGCSE '02 Proceedings of the 33rd SIGCSE technical symposium on Computer science education
An easy-to-use toolkit for efficient Java bytecode translators
Proceedings of the 2nd international conference on Generative programming and component engineering
Using software testing to move students from trial-and-error to reflection-in-action
Proceedings of the 35th SIGCSE technical symposium on Computer science education
Test-driven learning: intrinsic integration of testing into the CS/SE curriculum
Proceedings of the 37th SIGCSE technical symposium on Computer science education
Implications of integrating test-driven development into CS1/CS2 curricula
Proceedings of the 40th ACM technical symposium on Computer science education
Mutation analysis vs. code coverage in automated assessment of students' testing skills
Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companion
Assessing Oracle Quality with Checked Coverage
ICST '11 Proceedings of the 2011 Fourth IEEE International Conference on Software Testing, Verification and Validation
Running students' software tests against each others' code: new life for an old "gimmick"
Proceedings of the 43rd ACM technical symposium on Computer Science Education
Toward practical mutation analysis for evaluating the quality of student-written software tests
Proceedings of the ninth annual international ACM conference on International computing education research
Hi-index | 0.00 |
Although software testing is included as a regular part of many programming courses, current assessment techniques used in automated grading tools for evaluating student-written software tests are imperfect. Code coverage measures are typically used in practice, but that approach does not assess how much of the expected behavior is checked by the tests and sometimes, overestimates the true quality of the tests. Two robust and thorough measures for evaluating student-written tests are running each students' tests against others' solutions(known as all-pairs testing) and injecting artificial bugs to determine if tests can detect them (also known as mutation analysis). Even though they are better indicators of test quality, both of them posed a number of practical obstacles to classroom use. This proposal describes technical obstacles behind using these two approaches in automated grading. We propose novel and practical solutions to apply all-pairs testing and mutation analysis of student-written tests, especially in the context of classroom grading tools. Experimental results of applying our techniques in eight CS1 and CS2 assignments submitted by 147 students show the feasibility of our solution. Finally, we discuss our plan to combine the approaches to evaluate tests of assignments having variable amounts of design freedom and explain their evaluation method.