Experiences in crawling deep web in the context of local search

  • Authors:
  • Dheerendranath Mundluru;Xiongwu Xia

  • Affiliations:
  • Local.com Corporation, Irvine, CA, USA;Local.com Corporation, Irvine, CA, USA

  • Venue:
  • Proceedings of the 2nd international workshop on Geographic information retrieval
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Local search engines allow geographically constrained searching of businesses and their products or services. Some of the local search engines use crawlers for indexing Web page contents. These crawlers mostly index Web pages that are accessible through hyperlinks and which include desirable location information. It is extremely important for local search engines to also crawl additional high-quality "local" content (e.g., user reviews) that is available in the Deep Web. Much of this content is hidden behind search forms and is in the form of structured data, which is increasing very rapidly. In this paper, we present our experiences in crawling and extracting a wide variety of local structured data from large number of Deep Web resources. We discuss the challenges in crawling such sources and based on our experience we offer some effective principles to address them. Our experimental results on several Deep Web sources with local content show that the techniques discussed are highly effective.