Regression testing for wrapper maintenance
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Automating Web navigation with the WebVCR
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Wrapper induction: efficiency and expressiveness
Artificial Intelligence - Special issue on Intelligent internet systems
A brief survey of web data extraction tools
ACM SIGMOD Record
Proceedings of the 27th International Conference on Very Large Data Bases
Semi-Automatic Wrapper Generation for Commercial Web Sources
Proceedings of the IFIP TC8 / WG8.1 Working Conference on Engineering Information Systems in the Internet Context
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Schema-guided wrapper maintenance for web-data extraction
WIDM '03 Proceedings of the 5th ACM international workshop on Web information and data management
Efficient Wrapper Reinduction from Dynamic Web Sources
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Web data extraction based on partial tree alignment
WWW '05 Proceedings of the 14th international conference on World Wide Web
ITPilot: A Toolkit for Industrial-Strength Web Data Extraction
WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
Automatically Maintaining Wrappers for Web Sources
IDEAS '05 Proceedings of the 9th International Database Engineering & Application Symposium
Wrapper maintenance: a machine learning approach
Journal of Artificial Intelligence Research
Providing personalized mashups within the context of existing web applications
WISE'07 Proceedings of the 8th international conference on Web information systems engineering
Hi-index | 0.00 |
A substantial subset of the web data follows some kind of underlying structure. In order to let software programs gain full benefit from these “semi-structured” web sources, wrapper programs are built to provide a “machine-readable” view over them. A significant problem with wrappers is that, since web sources are autonomous, they may experience changes that invalidate the current wrapper, so automatic maintenance is an important research issue. Web wrappers must perform two kinds of tasks: automatically navigating through websites and automatically extracting structured data from HTML pages. While several previous works have addressed the automatic maintenance of the components performing the data extraction task, the problem of automatically maintaining the required web navigation sequences remains unaddressed to the best of our knowledge. In this paper we propose and expirementally validate a set of novel heuristics and algorithms to fill this gap.