Assistive technology computers and persons with disabilities
Communications of the ACM
Access to graphical interfaces for blind users
interactions
A new paradigm for browsing the web
CHI '95 Conference Companion on Human Factors in Computing Systems
Improving GUI accessibility for people with low vision
CHI '95 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Improving the usability of speech-based interfaces for blind users
Assets '96 Proceedings of the second annual ACM conference on Assistive technologies
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
A hierarchical approach to wrapper induction
Proceedings of the third annual conference on Autonomous Agents
Two approaches to bringing Internet services to WAP devices
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Accordion summarization for end-game browsing on PDAs and cellular phones
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Seeing the whole in parts: text summarization for web browsing on handheld devices
Proceedings of the 10th international conference on World Wide Web
Automatic identification and organization of index terms for interactive browsing
Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Web content accessibility guidelines 1.0
interactions
Designing the User Interface: Strategies for Effective Human-Computer Interaction
Designing the User Interface: Strategies for Effective Human-Computer Interaction
Usability Engineering
Detecting web page structure for adaptive viewing on small form factor devices
WWW '03 Proceedings of the 12th international conference on World Wide Web
Extracting content from accessible web pages
W4A '05 Proceedings of the 2005 International Cross-Disciplinary Workshop on Web Accessibility (W4A)
Extracting context to improve accuracy for HTML content extraction
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Personalizable edge services for web accessibility
W4A '06 Proceedings of the 2006 international cross-disciplinary workshop on Web accessibility (W4A): Building the mobile web: rediscovering accessibility?
Adaptive web-page content identification
Proceedings of the 9th annual ACM international workshop on Web information and data management
Understanding web documents: finding pagelets for transformation using structural patterns
International Journal of Web Engineering and Technology
Combining content extraction heuristics: the CombinE system
Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services
Distilling Informative Content from HTML News Pages
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
An empirical study on using hidden markov model for search interface segmentation
Proceedings of the 18th ACM conference on Information and knowledge management
RENS --- Enabling a Robot to Identify a Person
ICIRA '09 Proceedings of the 2nd International Conference on Intelligent Robotics and Applications
ContentEx: a framework for automatic content extraction programs
ISI'09 Proceedings of the 2009 IEEE international conference on Intelligence and security informatics
One approach to HTML wrappers creation: using Document Object Model tree
CompSysTech '09 Proceedings of the International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing
CETR: content extraction via tag ratios
Proceedings of the 19th international conference on World wide web
Blog post and comment extraction using information quantity of web format
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Print-friendly page extraction for web printing service
Proceedings of the 11th ACM symposium on Document engineering
Exploiting semantic structure for mapping user-specified form terms to SNOMED CT concepts
Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium
An automatic approach to displaying web applications as portlets
ICDCIT'06 Proceedings of the Third international conference on Distributed Computing and Internet Technology
Transaction models for Web accessibility
World Wide Web
Information extraction from webpages based on DOM distances
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Improving web accessibility for dichromat users through contrast preservation
ICCHP'12 Proceedings of the 13th international conference on Computers Helping People with Special Needs - Volume Part I
Research and Implementation of Self-Publishing Website Platforms for Universities Based on CMS
International Journal of Advanced Pervasive and Ubiquitous Computing
Automatic Extraction of Blog Post from Diverse Blog Pages
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Automatic generation of limited-depth hyper-documents from clinical guidelines
Proceedings of the 2013 ACM symposium on Document engineering
Hi-index | 0.00 |
Web pages often contain clutter (such as unnecessary images and extraneous links) around the body of an article that distracts a user from actual content. Extraction of "useful and relevant" content from web pages has many applications, including cell phone and PDA browsing, speech rendering for the visually impaired, and text summarization. Most approaches to making content more readable involve changing font size or removing HTML and data components such as images, which takes away from a webpage's inherent look and feel. Unlike "Content Reformatting," which aims to reproduce the entire webpage in a more convenient form, our solution directly addresses "Content Extraction." We have developed a framework that employs an easily extensible set of techniques. It incorporates advantages of previous work on content extraction. Our key insight is to work with DOM trees, a W3C specified interface that allows programs to dynamically access document structure, rather than with raw HTML markup. We have implemented our approach in a publicly available Web proxy to extract content from HTML web pages. This proxy can be used both centrally, administered for groups of users, as well as by individuals for personal browsers. We have also, after receiving feedback from users about the proxy, created a revised version with improved performance and accessibility in mind.