Heuristic evaluation of user interfaces
CHI '90 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Finding usability problems through heuristic evaluation
CHI '92 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
The evaluator effect in usability tests
CHI 98 Cconference Summary on Human Factors in Computing Systems
CHI '02 Extended Abstracts on Human Factors in Computing Systems
Usability Engineering
The Evaluator Effect during First-Time Use of the Cognitive Walkthrough Technique
Proceedings of HCI International (the 8th International Conference on Human-Computer Interaction) on Human-Computer Interaction: Ergonomics and User Interfaces-Volume I - Volume I
Comparing accessibility evaluation tools: a method for tool effectiveness
Universal Access in the Information Society
Reliability of severity estimates for usability problems found by heuristic evaluation
CHI '92 Posters and Short Talks of the 1992 SIGCHI Conference on Human Factors in Computing Systems
Web Accessibility: Web Standards and Regulatory Compliance
Web Accessibility: Web Standards and Regulatory Compliance
The relationship between accessibility and usability of websites
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Handbook of Usability TestingXXX: Howto Plan, Design, and Conduct Effective Tests
Handbook of Usability TestingXXX: Howto Plan, Design, and Conduct Effective Tests
Beyond Conformance: The Role of Accessibility Evaluation Methods
WISE '08 Proceedings of the 2008 international workshops on Web Information Systems Engineering
International Journal of Human-Computer Studies
How much does expertise matter?: a barrier walkthrough study with experts and non-experts
Proceedings of the 11th international ACM SIGACCESS conference on Computers and accessibility
Number of people required for usability evaluation: the 10±2 rule
Communications of the ACM
On the testability of WCAG 2.0 for beginners
Proceedings of the 2010 International Cross Disciplinary Conference on Web Accessibility (W4A)
Testability and validity of WCAG 2.0: the expertise effect
Proceedings of the 12th international ACM SIGACCESS conference on Computers and accessibility
Web accessibility as a side effect
Proceedings of the 14th international ACM SIGACCESS conference on Computers and accessibility
ACM SIGACCESS Accessibility and Computing
Progress on Website Accessibility?
ACM Transactions on the Web (TWEB)
Hi-index | 0.00 |
The Web Content Accessibility Guidelines (WCAG) 2.0 separate testing into both “Machine” and “Human” audits; and further classify “Human Testability” into “Reliably Human Testable” and “Not Reliably Testable”; it is human testability that is the focus of this paper. We wanted to investigate the likelihood that “at least 80% of knowledgeable human evaluators would agree on the conclusion” of an accessibility audit, and therefore understand the percentage of success criteria that could be described as reliably human testable, and those that could not. In this case, we recruited twenty-five experienced evaluators to audit four pages for WCAG 2.0 conformance. These pages were chosen to differ in layout, complexity, and accessibility support, thereby creating a small but variable sample. We found that an 80% agreement between experienced evaluators almost never occurred and that the average agreement was at the 70--75% mark, while the error rate was around 29%. Further, trained—but novice—evaluators performing the same audits exhibited the same agreement to that of our more experienced ones, but a reduction on validity of 6--13% ; the validity that an untrained user would attain can only be a conjecture. Expertise appears to improve (by 19%) the ability to avoid false positives. Finally, pooling the results of two independent experienced evaluators would be the best option, capturing at most 76% of the true problems and producing only 24% of false positives. Any other independent combination of audits would achieve worse results. This means that an 80% target for agreement, when audits are conducted without communication between evaluators, is not attainable, even with experienced evaluators, when working on pages similar to the ones used in this experiment; that the error rate even for experienced evaluators is relatively high and further, that untrained accessibility auditors be they developers or quality testers from other domains, would do much worse than this.