Weak inter-rater reliability in heuristic evaluation of video games

Authors:
Gareth R. White;Pejman Mirza-babaei;Graham McAllister;Judith Good
Affiliations:
The University of Sussex, Brighton, United Kingdom;The University of Sussex, Brighton, United Kingdom;The University of Sussex, Brighton, United Kingdom;The University of Sussex, Brighton, United Kingdom
Venue:
CHI '11 Extended Abstracts on Human Factors in Computing Systems
Year:
2011

Citing 9
Cited 1

The evaluator effect in usability tests

CHI 98 Cconference Summary on Human Factors in Computing Systems
On the reliability of usability testing

CHI '01 Extended Abstracts on Human Factors in Computing Systems
Reconditioned merchandise: extended structured report formats in usability inspection

CHI '04 Extended Abstracts on Human Factors in Computing Systems
Comparative usability evaluation

Behaviour & Information Technology
Heuristic evaluation for games: usability principles for video game design

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Comparison of techniques for matching of usability problem descriptions

Interacting with Computers
Game Usability Heuristics (PLAY) for Evaluating and Designing Better Games: The Next Iteration

OCSC '09 Proceedings of the 3d International Conference on Online Communities and Social Computing: Held as Part of HCI International 2009
Expert review method in game evaluations: comparison of two playability heuristic sets

Proceedings of the 13th International MindTrek Conference: Everyday Life in the Ubiquitous Era
What makes evaluators to find more usability problems?: a meta-analysis for individual detection rates

HCI'07 Proceedings of the 12th international conference on Human-computer interaction: interaction design and usability

Usability testing for serious games: making informed design decisions with user data

Advances in Human-Computer Interaction - Special issue on User Assessment in Serious Games and Technology-Enhanced Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Heuristic evaluation promises to be a low-cost usability evaluation method, but is fraught with problems of subjective interpretation, and a proliferation of competing and contradictory heuristic lists. This is particularly true in the field of games research where no rigorous comparative validation has yet been published. In order to validate the available heuristics, a user test of a commercial game is conducted with 6 participants in which 88 issues are identified, against which 146 heuristics are rated for relevance by 3 evaluators. Weak inter-rater reliability is calculated with Krippendorff's Alpha of 0.343, refuting validation of any of the available heuristics. This weak reliability is due to the high complexity of video games, resulting in evaluators interpreting different reasonable causes and solutions for the issues, and hence the wide variance in their ratings of the heuristics.