Extracting content from accessible web pages

Authors:
Suhit Gupta;Gail Kaiser
Affiliations:
New York, NY;New York, NY
Venue:
W4A '05 Proceedings of the 2005 International Cross-Disciplinary Workshop on Web Accessibility (W4A)
Year:
2005

Citing 3
Cited 11

Seeing the whole in parts: text summarization for web browsing on handheld devices

Proceedings of the 10th international conference on World Wide Web
DOM-based content extraction of HTML documents

WWW '03 Proceedings of the 12th international conference on World Wide Web
Automating Content Extraction of HTML Documents

World Wide Web

Personalizable edge services for web accessibility

W4A '06 Proceedings of the 2006 international cross-disciplinary workshop on Web accessibility (W4A): Building the mobile web: rediscovering accessibility?
A Semantic-web based framework for developing applications to improve accessibility in the WWW

W4A '06 Proceedings of the 2006 international cross-disciplinary workshop on Web accessibility (W4A): Building the mobile web: rediscovering accessibility?
SADIe: Structural semantics for accessibility and device independence

ACM Transactions on Computer-Human Interaction (TOCHI)
A Personal Web Information/Knowledge Retrieval System

Proceedings of the 2008 conference on Information Modelling and Knowledge Bases XIX
A New Partial Information Extraction Method for Personal Mashup Construction

Proceedings of the 2010 conference on Information Modelling and Knowledge Bases XXI
Web mediators for accessible browsing

ERCIM'06 Proceedings of the 9th conference on User interfaces for all
Identifying Behavioral Strategies of Visually Impaired Users to Improve Access to Web Content

ACM Transactions on Accessible Computing (TACCESS)
Effectiveness, productivity and satisfaction of persons with sight and motor disabilities when using dynamic text-only pages

Journal of Web Engineering
Friend Lens: novel web content sharing through strategic manipulation of cached html

International Journal of Web Based Communities
Optimizing the user environment: leading towards an accessible and usable experience

Accessible Design'05 Proceedings of the 2005 international conference on Accessible Design in the Digital World
Improving web accessibility for dichromat users through contrast preservation

ICCHP'12 Proceedings of the 13th international conference on Computers Helping People with Special Needs - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Web pages often contain clutter (such as ads, unnecessary animations and extraneous links) around the body of an article, which distracts a user from actual content. This can be especially inconvenient for blind and visually impaired users. The W3C's Web Accessibility Initiative (WAI) has defined a set of guidelines to make web pages more compatible with tools built specifically for persons with disabilities. While this initiative has put forth an excellent set of principles, unfortunately many websites continue to be inaccessible as well as cluttered. In order to address the clutter problem, we have developed a framework that employs a host of heuristics in the form of tunable filters for the purpose of content extraction. Our hypothesis is that automatically filtering out selected elements from websites will leave the base content that users are interested in and, as a side-effect, render them more accessible. Although our heuristics are intuition-based, rather than derived from the W3C accessibility guidelines, we imagined however that they would have little impact on web pages that are fully compliant with the accessibility guidelines. We were wrong: some (technically) accessible web pages still include significant clutter. This paper discusses our content extraction framework and its application to accessible web pages.