Learning query languages of Web interfaces
Proceedings of the 2004 ACM symposium on Applied computing
Adaptive web information extraction
Communications of the ACM - Two decades of the language-action perspective
Documentum ECI self-repairing wrappers: performance analysis
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Intelligent self-repairable web wrappers
AI*IA'11 Proceedings of the 12th international conference on Artificial intelligence around man and beyond
Hi-index | 0.00 |
We address the problem of automatic maintenance of Web wrappers used in data integration systems to encapsulate an access to Web information providers. The maintenance of Web wrappers is critical as providers often changes the page format and/or structure making wrappers inoperable. The solution we propose extends the conventional wrapper architecture with a novel component of automatic maintenance and recovery. We consider the automaticrecovery as special type of the classification problem and use ensemble methods of machine learning to build alternative views of provider pages. We combine extraction rules of conventional wrappers with content features of extracted information to accurate recovery from three types of format changes, namely, content, context and structural changes. We report results of the recovery performance for format changes at widely used Web providers.