Number of people required for usability evaluation: the 10±2 rule

  • Authors:
  • Wonil Hwang;Gavriel Salvendy

  • Affiliations:
  • Soongsil University in Seoul, Korea;Purdue University in West Lafayette, Indiana and Tsinghua University in Beijing, P.R. China

  • Venue:
  • Communications of the ACM
  • Year:
  • 2010

Quantified Score

Hi-index 48.23

Visualization

Abstract

Introduction Usability evaluation is essential to make sure that software products newly released are easy to use, efficient, and effective to reach goals, and satisfactory to users. For example, when a software company wants to develop and sell a new product, the company needs to evaluate usability of the new product before launching it at a market to avoid the possibility that the new product may contain usability problems, which span from cosmetic problems to severe functional problems. Three widely used methods for usability evaluation are Think Aloud (TA), Heuristic Evaluation (HE) and Cognitive Walkthrough (CW). TA method is commonly employed with a lab-based user testing, while there are variants of TA methods, including thinking out aloud at user's workplace instead of at labs. What we discuss here is the TA method that is combined with a lab-based user testing, in which test users use products while simultaneously and continuously thinking out aloud, and experimenters record users' behaviors and verbal protocols in the laboratory. HE is a usability inspection method, in which a small number of evaluators find usability problems in a user interface design by examining an interface and judging its compliance with well-known usability principles, called heuristics. CW is a theory-based method, in which evaluators evaluate every step necessary to perform a scenario-based task, and look for usability problems that would interfere with learning by exploration. These three methods have their own advantages and disadvantages. For instance, TA method provides good qualitative data from a small number of test users, but laboratory environment may influence test user's behaviors. HE is a cheap, fast and easy-to-use method, while it often finds too specific and low-priority usability problems, including even not real problems. CW helps find mismatches between users' and designers' conceptualization of a task, but it needs extensive knowledge of cognitive psychology and technical details to apply. However, even though these advantages and disadvantages show overall characteristics of three major usability evaluation methods, we cannot compare them quantitatively and see their efficiency clearly. Because one of reasons why so-called discounted methods, such as HE and CW, were developed is to save costs of usability evaluation, cost-related criteria for comparing usability evaluation are meaningful to usability practitioners as well as usability researchers. One of the most disputable issues related to cost of usability evaluation is sample size. That is, how many users or evaluators are needed to achieve a targeted usability evaluation performance, for example, 80% of overall discovery rate? The sample size of usability evaluation is known to depend on an estimate of problem discovery rate across participants. The overall discovery rate is a common quantitative measure that is used to show the effectiveness of a specific usability evaluation method in most of usability evaluation studies. It is also called overall detection rate or thoroughness measure, which is the ratio of 'the sum of unique usability problems detected by all experiment participants' against 'the number of usability problems that exist in the evaluated systems', ranging between 0 and 1. The overall discovery rates were reported more than any other criterion measure in the usability evaluation experiments and also a key component for projecting required sample size for usability evaluation study. Thus, how many test users or evaluators participate in the usability evaluation is a critical issue, considering its cost-effectiveness.