A hierarchical approach to wrapper induction
Proceedings of the third annual conference on Autonomous Agents
Proceedings of the 27th International Conference on Very Large Data Bases
Mining data records in Web pages
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Fully automatic wrapper generation for search engines
WWW '05 Proceedings of the 14th international conference on World Wide Web
Communications of the ACM - ACM at sixty: a look back in time
Introduction to Information Retrieval
Introduction to Information Retrieval
Automatically constructing wrappers for effective and efficient web information extraction
Automatically constructing wrappers for effective and efficient web information extraction
Crawling and Extracting Process Data from the Web
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Automatically extracting web data records
AMT'10 Proceedings of the 6th international conference on Active media technology
Where the streets have no name: experiences in GIR for a developing country
Proceedings of the 7th Workshop on Geographic Information Retrieval
Hi-index | 0.00 |
Local search engines allow geographically constrained searching of businesses and their products or services. Some of the local search engines use crawlers for indexing Web page contents. These crawlers mostly index Web pages that are accessible through hyperlinks and which include desirable location information. It is extremely important for local search engines to also crawl additional high-quality "local" content (e.g., user reviews) that is available in the Deep Web. Much of this content is hidden behind search forms and is in the form of structured data, which is increasing very rapidly. In this paper, we present our experiences in crawling and extracting a wide variety of local structured data from large number of Deep Web resources. We discuss the challenges in crawling such sources and based on our experience we offer some effective principles to address them. Our experimental results on several Deep Web sources with local content show that the techniques discussed are highly effective.