A Novel Architecture for Deep Web Crawler

  • Authors:
  • Dilip Kumar Sharma;A. K. Sharma

  • Affiliations:
  • Shobhit University, India;YMCA University of Science and Technology, India

  • Venue:
  • International Journal of Information Technology and Web Engineering
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

A traditional crawler picks up a URL, retrieves the corresponding page and extracts various links, adding them to the queue. A deep Web crawler, after adding links to the queue, checks for forms. If forms are present, it processes them and retrieves the required information. Various techniques have been proposed for crawling deep Web information, but much remains undiscovered. In this paper, the authors analyze and compare important deep Web information crawling techniques to find their relative limitations and advantages. To minimize limitations of existing deep Web crawlers, a novel architecture is proposed based on QIIIEP specifications Sharma & Sharma, 2009. The proposed architecture is cost effective and has features of privatized search and general search for deep Web data hidden behind html forms.