A brief survey of web data extraction tools
ACM SIGMOD Record
Data extraction and label assignment for web databases
WWW '03 Proceedings of the 12th international conference on World Wide Web
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Mining data records in Web pages
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Fully automatic wrapper generation for search engines
WWW '05 Proceedings of the 14th international conference on World Wide Web
Web wrapper induction: a brief survey
AI Communications
Joint optimization of wrapper generation and template detection
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient record-level wrapper induction
Proceedings of the 18th ACM conference on Information and knowledge management
No Code Required: Giving Users Tools to Transform the Web
No Code Required: Giving Users Tools to Transform the Web
Hi-index | 0.00 |
We present a demonstration of an interactive wrapper induction system, called Pictor, which is able to minimize labeling cost, yet extract data with high accuracy from a website. Our demonstration will introduce two proposed technologies: record-level wrappers and a wrapper-assisted labeling strategy. These approaches allow Pictor to exploit previously generated wrappers, in order to predict similar labels in a partially labeled webpage or a completely new webpage. Our experiment results show the effectiveness of the Pictor system.