Weak inter-rater reliability in heuristic evaluation of video games

  • Authors:
  • Gareth R. White;Pejman Mirza-babaei;Graham McAllister;Judith Good

  • Affiliations:
  • The University of Sussex, Brighton, United Kingdom;The University of Sussex, Brighton, United Kingdom;The University of Sussex, Brighton, United Kingdom;The University of Sussex, Brighton, United Kingdom

  • Venue:
  • CHI '11 Extended Abstracts on Human Factors in Computing Systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Heuristic evaluation promises to be a low-cost usability evaluation method, but is fraught with problems of subjective interpretation, and a proliferation of competing and contradictory heuristic lists. This is particularly true in the field of games research where no rigorous comparative validation has yet been published. In order to validate the available heuristics, a user test of a commercial game is conducted with 6 participants in which 88 issues are identified, against which 146 heuristics are rated for relevance by 3 evaluators. Weak inter-rater reliability is calculated with Krippendorff's Alpha of 0.343, refuting validation of any of the available heuristics. This weak reliability is due to the high complexity of video games, resulting in evaluators interpreting different reasonable causes and solutions for the issues, and hence the wide variance in their ratings of the heuristics.