Topical crawling on the web through local site-searches

  • Authors:
  • Yaling Liu;Arvin Agah

  • Affiliations:
  • Department of Electrical Engineering and Computer Science, University of Kansas;Department of Electrical Engineering and Computer Science, University of Kansas

  • Venue:
  • Journal of Web Engineering
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we investigate the feasibility of discovering topical resources by combining Web searches and local site-searches. Existing techniques of topical resource discovery consist of crawling the Web and searching the Web. The former typically analyses linkage among Web pages to estimate the relevance of an unseen document to a topic. The latter exploits the indices of generic search engines to discover documents relevant to a topic. Although the local site-search has been a simple and convenient feature of a Web site for human users to quickly locate desired information within the site that hosts tremendous number of documents, this feature has been ignored by the techniques of automatic topical resource discovery. A typical local site-search returns a list of titles, hyperlinks, and snippets of relevant documents that can be used to estimate the relevance of the documents to the topic before actually fetching the documents. We propose an operational model to make use of this simple feature, and address how this model can be realized. Experiments have shown that this simple but efficient approach can provide much more precise estimations than a sophisticated intelligent topical crawler.