Benchmarking web accessibility evaluation tools: measuring the harm of sole reliance on automated tests

Authors:
Markel Vigo;Justin Brown;Vivienne Conway
Affiliations:
University of Manchester, Manchester, UK;Edith Cowan University, Perth (Australia);Edith Cowan University, Perth (Australia)
Venue:
Proceedings of the 10th International Cross-Disciplinary Conference on Web Accessibility
Year:
2013

Citing 19
Cited 0

Automatic detection of text genre

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Comparing accessibility evaluation tools: a method for tool effectiveness

Universal Access in the Information Society
Accessibility designer: visualizing usability for the blind

Assets '04 Proceedings of the 6th international ACM SIGACCESS conference on Computers and accessibility
Is your web page accessible?: a comparative study of methods for assessing web page accessibility for the blind

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Language identification in web pages

Proceedings of the 2005 ACM symposium on Applied computing
Contextual web accessibility - maximizing the benefit of accessibility guidelines

W4A '06 Proceedings of the 2006 international cross-disciplinary workshop on Web accessibility (W4A): Building the mobile web: rediscovering accessibility?
Managing usability for people with disabilities in a large web presence

IBM Systems Journal
Effects of sampling methods on web accessibility evaluations

Proceedings of the 9th international ACM SIGACCESS conference on Computers and accessibility
A comparative test of web accessibility evaluation methods

Proceedings of the 10th international ACM SIGACCESS conference on Computers and accessibility
The BenToWeb Test Case Suites for the Web Content Accessibility Guidelines (WCAG) 2.0

ICCHP '08 Proceedings of the 11th international conference on Computers Helping People with Special Needs
Beyond Conformance: The Role of Accessibility Evaluation Methods

WISE '08 Proceedings of the 2008 international workshops on Web Information Systems Engineering
Transition of accessibility evaluation tools to new standards

Proceedings of the 2009 International Cross-Disciplinary Conference on Web Accessibililty (W4A)
WCAG 2.0 Test Samples Repository

UAHCI '09 Proceedings of the 5th International Conference on Universal Access in Human-Computer Interaction. Part III: Applications and Services
Web not for all: a large scale study of web accessibility

Proceedings of the 2010 International Cross Disciplinary Conference on Web Accessibility (W4A)
Application of traditional software testing methodologies to web accessibility

Proceedings of the 2010 International Cross Disciplinary Conference on Web Accessibility (W4A)
Exploratory Analysis of Collaborative Web Accessibility Improvement

ACM Transactions on Accessible Computing (TACCESS)
Automatic web accessibility metrics: Where we are and where we can go

Interacting with Computers
Guidelines, icons and marketable skills: an accessibility evaluation of 100 web development company homepages

Proceedings of the International Cross-Disciplinary Conference on Web Accessibility
Guidelines are only half of the story: accessibility problems encountered by blind users on the web

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The use of web accessibility evaluation tools is a widespread practice. Evaluation tools are heavily employed as they help in reducing the burden of identifying accessibility barriers. However, an over-reliance on automated tests often leads to setting aside further testing that entails expert evaluation and user tests. In this paper we empirically show the capabilities of current automated evaluation tools. To do so, we investigate the effectiveness of 6 state-of-the-art tools by analysing their coverage, completeness and correctness with regard to WCAG 2.0 conformance. We corroborate that relying on automated tests alone has negative effects and can have undesirable consequences. Coverage is very narrow as, at most, 50% of the success criteria are covered. Similarly, completeness ranges between 14% and 38%; however, some of the tools that exhibit higher completeness scores produce lower correctness scores (66-71%) due to the fact that catching as many violations as possible can lead to an increase in false positives. Therefore, relying on just automated tests entails that 1 of 2 success criteria will not even be analysed and among those analysed, only 4 out of 10 will be caught at the further risk of generating false positives.