Comparison of Four Subjective Methods for Image Quality Assessment

Authors:
Rafał K. Mantiuk;Anna Tomaszewska;Radosław Mantiuk
Affiliations:
Bangor University, United Kingdom mantiuk@bangor.ac.uk;West Pomeranian University of Technology in Szczecin, Poland;West Pomeranian University of Technology in Szczecin, Poland
Venue:
Computer Graphics Forum
Year:
2012

Citing 9
Cited 2

Toward a psychophysically-based light reflection model for image synthesis

Proceedings of the 27th annual conference on Computer graphics and interactive techniques
Evaluation of tone mapping operators using a High Dynamic Range display

ACM SIGGRAPH 2005 Papers
Technical Section: Evaluation of HDR tone mapping methods using essential perceptual attributes

Computers and Graphics
Psychophysics 101: how to run perception experiments in computer graphics

ACM SIGGRAPH 2008 classes
The whys, how tos, and pitfalls of user studies

ACM SIGGRAPH 2009 Courses
A comparative study of image retargeting

ACM SIGGRAPH Asia 2010 papers
Experimental Design: From User Studies to Psychophysics

Experimental Design: From User Studies to Psychophysics
Full-Reference Image Quality Metrics: Classification and Evaluation

Foundations and Trends® in Computer Graphics and Vision
A Statistical Evaluation of Recent Full Reference Image Quality Assessment Algorithms

IEEE Transactions on Image Processing

Special Section on HDR Imaging: An evaluation of image reproduction algorithms for high contrast scenes on large and small screen display devices

Computers and Graphics
Zonal brightness coherency for video tone mapping

Image Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

To provide a convincing proof that a new method is better than the state of the art, computer graphics projects are often accompanied by user studies, in which a group of observers rank or rate results of several algorithms. Such user studies, known as subjective image quality assessment experiments, can be very time-consuming and do not guarantee to produce conclusive results. This paper is intended to help design efficient and rigorous quality assessment experiments and emphasise the key aspects of the results analysis. To promote good standards of data analysis, we review the major methods for data analysis, such as establishing confidence intervals, statistical testing and retrospective power analysis. Two methods of visualising ranking results together with the meaningful information about the statistical and practical significance are explored. Finally, we compare four most prominent subjective quality assessment methods: single-stimulus, double-stimulus, forced-choice pairwise comparison and similarity judgements. We conclude that the forced-choice pairwise comparison method results in the smallest measurement variance and thus produces the most accurate results. This method is also the most time-efficient, assuming a moderate number of compared conditions. © 2012 Wiley Periodicals, Inc.