Automated assessment of short free-text responses in computer science using latent semantic analysis

Authors:
Richard Klein;Angelo Kyrilov;Mayya Tokman
Affiliations:
University of the Witwatersrand, Johannesburg, South Africa;University of the Witwatersrand, Johannesburg, South Africa;University of California Merced, Merced, CA, USA
Venue:
Proceedings of the 16th annual joint conference on Innovation and technology in computer science education
Year:
2011

Citing 4
Cited 0

Addressing the testing challenge with a web-based e-assessment system that tutors as it assesses

Proceedings of the 15th international conference on World Wide Web
Experimenting with a computer essay-scoring program based on esl student writing scripts

ReCALL
e-assessment using latent semantic analysis in the computer science domain: a pilot study

eLearn '04 Proceedings of the Workshop on eLearning for Computational Linguistics and Computational Linguistics for eLearning
Review:

The Knowledge Engineering Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the last few decades, much research has focused on the evaluation and assessment of students' knowledge. The idea that computers can now be used to aid assessment is appealing. While implementing automatic marking of multiple choice questions is trivial, most educators agree that such form of assessment provides only limited insight into students' knowledge. Due to this limitation, teachers prefer to use unstructured questions in assessments. These questions, however, are much more complicated to mark as the semantic meaning of a response is more important than any of the individual keywords. Latent Semantic Analysis (LSA) has a remarkable ability to infer meaning from a text in this way. This paper describes the design, implementation and evaluation of an automatic marking system based on LSA and designed to grade paragraph responses to exam questions. In addition to presenting the algorithm and the theoretical basis of the system, we describe the tests that were conducted to test its efficacy. The tests included comparing the marks for several computer science courses exams generated by the system with the original grades awarded by a human examiner. The various settings of the system are studied to understand their effect on the accuracy. Using this understanding, along with trial and error, good configurations for each question are found. Under ideal configurations the system is capable of generating marks with correlations above 0.80 compared to the human examiner's grades. This is considered an acceptable variance. Generating these ideal configurations is nontrivial but possible by designing effective ways to extract appropriate settings from features of the data. If there is enough training data the system easily performs at rates that match the inter-correlation between human markers.