Template-based wrappers in the TSIMMIS system
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
A scalable comparison-shopping agent for the World-Wide Web
AGENTS '97 Proceedings of the first international conference on Autonomous agents
Wrapper generation for semi-structured Internet sources
ACM SIGMOD Record
Regression testing for wrapper maintenance
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Automatic repairing of web wrappers
Proceedings of the 3rd international workshop on Web information and data management
World Wide Web
Building Light-Weight Wrappers for Legacy Web Data-Sources Using W4F
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Visual Web Information Extraction with Lixto
Proceedings of the 27th International Conference on Very Large Data Bases
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Extracting Patterns and Relations from the World Wide Web
WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Learning the Common Structure of Data
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
SG-WRAP: A Schema-Guided Wrapper Generator
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Schema-guided wrapper maintenance for web-data extraction
WIDM '03 Proceedings of the 5th ACM international workshop on Web information and data management
Schema-guided wrapper maintenance for web-data extraction
WIDM '03 Proceedings of the 5th ACM international workshop on Web information and data management
Constraint-based wrapper specification and verification for cooperative information systems
Information Systems - Special issue: Data quality in cooperative information systems
Efficient Wrapper Reinduction from Dynamic Web Sources
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Automatic wrapper maintenance for semi-structured web sources using results from previous queries
Proceedings of the 2005 ACM symposium on Applied computing
Mapping maintenance for data integration systems
VLDB '05 Proceedings of the 31st international conference on Very large data bases
eTuner: tuning schema matching software using synthetic scenarios
The VLDB Journal — The International Journal on Very Large Data Bases
Automatically maintaining wrappers for semi-structured web sources
Data & Knowledge Engineering
Automatically maintaining navigation sequences for querying semi-structured web sources
Data & Knowledge Engineering
No Code Required: Giving Users Tools to Transform the Web
No Code Required: Giving Users Tools to Transform the Web
Adaptable wrapper generation for web page format change
ACOS'06 Proceedings of the 5th WSEAS international conference on Applied computer science
Intelligent self-repairable web wrappers
AI*IA'11 Proceedings of the 12th international conference on Artificial intelligence around man and beyond
DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
RecipeCrawler: collecting recipe data from WWW incrementally
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Automatic data extraction from data-rich web pages
DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
Maintaining web navigation flows for wrappers
DEECS'06 Proceedings of the Second international conference on Data Engineering Issues in E-Commerce and Services
WebSelF: a web scraping framework
ICWE'12 Proceedings of the 12th international conference on Web Engineering
TEX: An efficient and effective unsupervised Web information extractor
Knowledge-Based Systems
Intelligent and adaptive crawling of web applications for web archiving
ICWE'13 Proceedings of the 13th international conference on Web Engineering
Hi-index | 0.00 |
Extracting data from Web pages using wrappers is a fundamental problem arising in a large variety of applications of vast practical interests. There are two main issues relevant to Web-data extraction, namely wrapper generation and wrapper maintenance. In this paper, we propose a novel schema-guided approach to the problem of automatic wrapper maintenance. It is based on the observation that despite various page changes, many important features of the pages are preserved, such as syntactic patterns, annotations, and hyperlinks of the extracted data items. Our approach uses these preserved features to identify the locations of the desired values in the changed pages, and repair wrappers correspondingly by inducing semantic blocks from the HTML tree. Our intensive experiments on real Web sites show that the proposed approach can effectively maintain wrappers to extract desired data with high accuracies.