An analysis of question quality and user performance in crowdsourced exams

Authors:
Sarah K.K. Luger;Jeff Bowles
Affiliations:
The University of Edinburgh, Edinburgh, United Kingdom;University of New Mexico, Albuquerque, NM, USA
Venue:
Proceedings of the 2013 workshop on Data-driven user behavioral modelling and mining from social media
Year:
2013

Citing 1
Cited 0

Fast, cheap, and creative: evaluating translation quality using Amazon's Mechanical Turk

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automated systems that can measure the difficulty and discrimination power of Multiple Choice Questions (MCQs) have value for both educators, who spend large amounts of time creating novel questions, and students, who spend a great deal of effort both practicing for and taking tests. Assistance in creating high-quality assessment instruments would be welcomed by educators who do not have direct access to the proprietary data and methods used by educational testing companies. The current approach for measuring question difficulty relies on models of how good pupils will perform and contrasts that with their lower-performing peers. Inverting this process and allowing educators to test their questions before students answer them will speed up question development and utility. This paper covers both a method for automatically judging the difficulty and discriminating power of MCQs and how best to build sufficient exams from these good questions. It also presents a wider discussion of this method as extensible in several different domains with data that can be viewed as behaving in a similar manner to MCQs.