Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Automatic generation of agents for collecting hidden web pages for data extraction
Data & Knowledge Engineering - Special issue: WIDM 2002
Structure-driven crawler generation by example
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Meta-search based web resource discovery for object-level vertical search
WISE'06 Proceedings of the 7th international conference on Web Information Systems
The SHARC framework for data quality in Web archiving
The VLDB Journal — The International Journal on Very Large Data Bases
A pattern-based selective recrawling approach for object-level vertical search
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
In this paper we propose a novel recrawling method based on navigation patterns called Selective Recrawling. The goal of selective recrawling is to automatically select page collections that have large coverage and little redundancy to a pre-defined vertical domain. It only requires several seed objects and can select a set of URL patterns to cover most objects. The selected set can be used to recrawl the web pages for quite a period of time and renewed periodically. Experiments on local event data show that our method can greatly reduce the downloading of web pages while keep the comparative object coverage.