Scaling short-answer grading by combining peer assessment with algorithmic scoring

Authors:
Chinmay E. Kulkarni;Richard Socher;Michael S. Bernstein;Scott R. Klemmer
Affiliations:
Stanford University, Stanford, CA, USA;Stanford University, Stanford, CA, USA;Stanford University, Stanford, CA, USA;UC San Diego, La Jolla, CA, USA
Venue:
Proceedings of the first ACM conference on Learning @ scale conference
Year:
2014

Citing 9
Cited 0

Usability inspection methods

CHI '94 Conference Companion on Human Factors in Computing Systems
An unsupervised method for detecting grammatical errors

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Deep Read: a reading comprehension system

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Automated essay evaluation: the criterion online writing service

AI Magazine
Effective multi-label active learning for text classification

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Soylent: a word processor with a crowd inside

UIST '10 Proceedings of the 23nd annual ACM symposium on User interface software and technology
A new dataset and method for automatically grading ESOL texts

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
CommunitySourcing: engaging local crowds to perform expert work via physical kiosks

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Peer and self assessment in massive online classes

ACM Transactions on Computer-Human Interaction (TOCHI)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Peer assessment helps students reflect and exposes them to different ideas. It scales assessment and allows large online classes to use open-ended assignments. However, it requires students to spend significant time grading. How can we lower this grading burden while maintaining quality? This paper integrates peer and machine grading to preserve the robustness of peer assessment and lower grading burden. In the identify-verify pattern, a grading algorithm first predicts a student grade and estimates confidence, which is used to estimate the number of peer raters required. Peers then identify key features of the answer using a rubric. Finally, other peers verify whether these feature labels were accurately applied. This pattern adjusts the number of peers that evaluate an answer based on algorithmic confidence and peer agreement. We evaluated this pattern with 1370 students in a large, online design class. With only 54% of the student grading time, the identify-verify pattern yields 80-90% of the accuracy obtained by taking the median of three peer scores, and provides more detailed feedback. A second experiment found that verification dramatically improves accuracy with more raters, with a 20% gain over the peer-median with four raters. However, verification also leads to lower initial trust in the grading system. The identify-verify pattern provides an example of how peer work and machine learning can combine to improve the learning experience.