Item difficulty estimation: An auspicious collaboration between data and judgment

  • Authors:
  • Kelly Wauters;Piet Desmet;Wim Van Den Noortgate

  • Affiliations:
  • ITEC - IBBT, K.U.Leuven Kulak, Etienne Sabbelaan 53, B-8500 Kortrijk, Belgium and Faculty of Psychology and Educational Sciences, K.U.Leuven, Tiensestraat 102, B-3000 Leuven, Belgium;ITEC - IBBT, K.U.Leuven Kulak, Etienne Sabbelaan 53, B-8500 Kortrijk, Belgium and Faculty of Arts, K.U.Leuven, Blijde Inkomststraat 21, B-3000 Leuven, Belgium;ITEC - IBBT, K.U.Leuven Kulak, Etienne Sabbelaan 53, B-8500 Kortrijk, Belgium and Faculty of Psychology and Educational Sciences, K.U.Leuven, Tiensestraat 102, B-3000 Leuven, Belgium

  • Venue:
  • Computers & Education
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The evolution from static to dynamic electronic learning environments has stimulated the research on adaptive item sequencing. A prerequisite for adaptive item sequencing, in which the difficulty of the item is constantly matched to the ability level of the learner, is to have items with a known difficulty level. The difficulty level can be estimated by means of the item response theory (IRT). However, the requirement of a large sample size for calibrating items based on IRT models is not easily met in many practical learning situations. The aim of this paper is to search for relatively simple and fast alternative estimation methods and to review the accuracy of these methods as compared to IRT-based calibration in one single setting, and this for various sample sizes. Using real data, six alternative estimation methods are compared next to IRT-based calibration: proportion correct, learner feedback, expert rating, one-to-many comparison (learner), one-to-many comparison (expert) and the Elo rating system. Results indicate that proportion correct has the strongest relation with IRT-based difficulty estimates, followed by learner feedback, the Elo rating system, expert rating and finally one-to-many comparison. Learner feedback and one-to-many comparison (learner) provide stable estimates even with a small sample size. IRT, proportion correct and the Elo rating system provide reliable estimates, especially with a sample size of 200-250 learners. The alternative estimation methods can be utilized for adaptive item sequencing when IRT-based calibration does not yet provide reliable estimates or can be used as a prior in a Bayesian estimation method.