Topical crawling on the web through local site-searches

Authors:
Yaling Liu;Arvin Agah
Affiliations:
Department of Electrical Engineering and Computer Science, University of Kansas;Department of Electrical Engineering and Computer Science, University of Kansas
Venue:
Journal of Web Engineering
Year:
2013

Citing 20
Cited 0

Focused crawling: a new approach to topic-specific Web resource discovery

WWW '99 Proceedings of the eighth international conference on World Wide Web
Intelligent crawling on the World Wide Web with arbitrary predicates

Proceedings of the 10th international conference on World Wide Web
Searching the Web

ACM Transactions on Internet Technology (TOIT)
Evaluating topic-driven web crawlers

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
On the design of a learning crawler for topical resource discovery

ACM Transactions on Information Systems (TOIS)
Accelerated focused crawling through online relevance feedback

Proceedings of the 11th international conference on World Wide Web
Distributed Hypertext Resource Discovery Through Examples

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Focused Crawling Using Context Graphs

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Crawling the Hidden Web

Proceedings of the 27th International Conference on Very Large Data Bases
Building domain-specific web collections for scientific digital libraries: a meta-search enhanced focused crawling method

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
On Leveraging User Access Patterns for Topic Specific Crawling

Data Mining and Knowledge Discovery
Topical web crawlers: Evaluating adaptive algorithms

ACM Transactions on Internet Technology (TOIT)
Probabilistic models for focused web crawling

Proceedings of the 6th annual ACM international workshop on Web information and data management
Sampling search-engine results

WWW '05 Proceedings of the 14th international conference on World Wide Web
Learning to crawl: Comparing classification schemes

ACM Transactions on Information Systems (TOIS)
To search or to crawl?: towards a query optimizer for text-centric tasks

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Sampling Search-Engine Results

World Wide Web
Google's Deep Web crawl

Proceedings of the VLDB Endowment
Crawling and Extracting Process Data from the Web

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
A Prototype Process-Based Search Engine

ICSC '09 Proceedings of the 2009 IEEE International Conference on Semantic Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we investigate the feasibility of discovering topical resources by combining Web searches and local site-searches. Existing techniques of topical resource discovery consist of crawling the Web and searching the Web. The former typically analyses linkage among Web pages to estimate the relevance of an unseen document to a topic. The latter exploits the indices of generic search engines to discover documents relevant to a topic. Although the local site-search has been a simple and convenient feature of a Web site for human users to quickly locate desired information within the site that hosts tremendous number of documents, this feature has been ignored by the techniques of automatic topical resource discovery. A typical local site-search returns a list of titles, hyperlinks, and snippets of relevant documents that can be used to estimate the relevance of the documents to the topic before actually fetching the documents. We propose an operational model to make use of this simple feature, and address how this model can be realized. Experiments have shown that this simple but efficient approach can provide much more precise estimations than a sophisticated intelligent topical crawler.