Number of people required for usability evaluation: the 10±2 rule

Authors:
Wonil Hwang;Gavriel Salvendy
Affiliations:
Soongsil University in Seoul, Korea;Purdue University in West Lafayette, Indiana and Tsinghua University in Beijing, P.R. China
Venue:
Communications of the ACM
Year:
2010

Citing 8
Cited 20

Heuristic evaluation of user interfaces

CHI '90 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Refining the test phase of usability evaluation: how many subjects is enough?

Human Factors - Special issue: measurement in human factors
What is gained and lost when using evaluation methods other than empirical testing

HCI'92 Proceedings of the conference on People and computers VII
Estimating the number of subjects needed for a thinking aloud test

International Journal of Human-Computer Studies
Complementarity and convergence of heuristic evaluation and usability test: a case study of universal brokerage platform

Proceedings of the second Nordic conference on Human-computer interaction
Novice heuristic evaluations of a complex interface

CHI '99 Extended Abstracts on Human Factors in Computing Systems
The Evaluator Effect during First-Time Use of the Cognitive Walkthrough Technique

Proceedings of HCI International (the 8th International Conference on Human-Computer Interaction) on Human-Computer Interaction: Ergonomics and User Interfaces-Volume I - Volume I
Analysis of combinatorial user effect in international usability tests

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

Towards to the validation of a usability evaluation method for model-driven web development

Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement
Evaluating multimodal affective fusion using physiological signals

Proceedings of the 16th international conference on Intelligent user interfaces
Working with users to ensure quality of innovative software product despite uncertainties

ACIIDS'11 Proceedings of the Third international conference on Intelligent information and database systems - Volume Part I
Quality assessment of an expert system: an instrument of regular feedback from users

Transactions on computational collective intelligence III
Public engagement with biomedical research through location-sensitive technology

Proceedings of the 5th International Conference on Communities and Technologies
Sample size in usability studies

Communications of the ACM
Is accessibility conformance an elusive property? A study of validity and reliability of WCAG 2.0

ACM Transactions on Accessible Computing (TACCESS)
Need for usability and wish for mobility: case study of client end applications for primary healthcare providers in croatia

USAB'11 Proceedings of the 7th conference on Workgroup Human-Computer Interaction and Usability Engineering of the Austrian Computer Society: information Quality in e-Health
The effect of task assignments and instruction types on remote asynchronous usability testing

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Cloud services evaluation framework

Proceedings of the Workshop on Open Source and Design of Communication
How can usability contribute to user experience?: a study in the domain of e-commerce

Proceedings of the South African Institute for Computer Scientists and Information Technologists Conference
Empirical validation of a usability inspection method for model-driven Web development

Journal of Systems and Software
Informal approach based on user involvement to overcome uncertainties in a software project and to achieve high quality of an innovative product

International Journal of Intelligent Information and Database Systems
Informal approach based on user involvement to overcome uncertainties in a software project and to achieve high quality of an innovative product

International Journal of Intelligent Information and Database Systems
With how many users should you test a medical infusion pump? Sampling strategies for usability tests on high-risk systems

Journal of Biomedical Informatics
Usability evaluation guidelines for business intelligence applications

Proceedings of the South African Institute for Computer Scientists and Information Technologists Conference
System for evaluating usability and user experience by analyzing repeated patterns

DUXU'13 Proceedings of the Second international conference on Design, User Experience, and Usability: design philosophy, methods, and tools - Volume Part I
Evaluating the perceived and estimated quality in use of Web 2.0 applications

Journal of Systems and Software
eTACTS: A method for dynamically filtering clinical trial search results

Journal of Biomedical Informatics
Why 3D Cameras are Not Popular: A Qualitative User Study on Stereoscopic Photography Acceptance

3D Research

Quantified Score

Hi-index	48.23

Visualization

Abstract

Introduction Usability evaluation is essential to make sure that software products newly released are easy to use, efficient, and effective to reach goals, and satisfactory to users. For example, when a software company wants to develop and sell a new product, the company needs to evaluate usability of the new product before launching it at a market to avoid the possibility that the new product may contain usability problems, which span from cosmetic problems to severe functional problems. Three widely used methods for usability evaluation are Think Aloud (TA), Heuristic Evaluation (HE) and Cognitive Walkthrough (CW). TA method is commonly employed with a lab-based user testing, while there are variants of TA methods, including thinking out aloud at user's workplace instead of at labs. What we discuss here is the TA method that is combined with a lab-based user testing, in which test users use products while simultaneously and continuously thinking out aloud, and experimenters record users' behaviors and verbal protocols in the laboratory. HE is a usability inspection method, in which a small number of evaluators find usability problems in a user interface design by examining an interface and judging its compliance with well-known usability principles, called heuristics. CW is a theory-based method, in which evaluators evaluate every step necessary to perform a scenario-based task, and look for usability problems that would interfere with learning by exploration. These three methods have their own advantages and disadvantages. For instance, TA method provides good qualitative data from a small number of test users, but laboratory environment may influence test user's behaviors. HE is a cheap, fast and easy-to-use method, while it often finds too specific and low-priority usability problems, including even not real problems. CW helps find mismatches between users' and designers' conceptualization of a task, but it needs extensive knowledge of cognitive psychology and technical details to apply. However, even though these advantages and disadvantages show overall characteristics of three major usability evaluation methods, we cannot compare them quantitatively and see their efficiency clearly. Because one of reasons why so-called discounted methods, such as HE and CW, were developed is to save costs of usability evaluation, cost-related criteria for comparing usability evaluation are meaningful to usability practitioners as well as usability researchers. One of the most disputable issues related to cost of usability evaluation is sample size. That is, how many users or evaluators are needed to achieve a targeted usability evaluation performance, for example, 80% of overall discovery rate? The sample size of usability evaluation is known to depend on an estimate of problem discovery rate across participants. The overall discovery rate is a common quantitative measure that is used to show the effectiveness of a specific usability evaluation method in most of usability evaluation studies. It is also called overall detection rate or thoroughness measure, which is the ratio of 'the sum of unique usability problems detected by all experiment participants' against 'the number of usability problems that exist in the evaluated systems', ranging between 0 and 1. The overall discovery rates were reported more than any other criterion measure in the usability evaluation experiments and also a key component for projecting required sample size for usability evaluation study. Thus, how many test users or evaluators participate in the usability evaluation is a critical issue, considering its cost-effectiveness.