SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A hierarchical approach to wrapper induction
Proceedings of the third annual conference on Autonomous Agents
IEPAD: information extraction based on pattern discovery
Proceedings of the 10th international conference on World Wide Web
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Using structured tokens to identify webpages for data extraction
APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
Hi-index | 0.00 |
The paper investigates techniques for extracting data from large set of dynamic web pages. Dynamically generated web pages from a single web site have a common semi structure for all the data objects. A wrapper of these dynamic web pages is defined as a common template for these pages with different data objects embedded in each web page. Information Extraction is done in three steps: (a) Data Rich Section Extraction from each web page (b) Automated generation of wrapper (c) Data extraction from each web page by comparing it with the wrapper. Wrapper generation is the most important part of this process. Our focus was on developing new improved techniques for wrapper generation. Our technique is fully automated and we were able to achieve good increase in accuracy and speed.