Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Intelligent crawling on the World Wide Web with arbitrary predicates
Proceedings of the 10th international conference on World Wide Web
ACM Transactions on Internet Technology (TOIT)
Evaluating topic-driven web crawlers
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
On the design of a learning crawler for topical resource discovery
ACM Transactions on Information Systems (TOIS)
Accelerated focused crawling through online relevance feedback
Proceedings of the 11th international conference on World Wide Web
Distributed Hypertext Resource Discovery Through Examples
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Focused Crawling Using Context Graphs
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Proceedings of the 27th International Conference on Very Large Data Bases
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
On Leveraging User Access Patterns for Topic Specific Crawling
Data Mining and Knowledge Discovery
Topical web crawlers: Evaluating adaptive algorithms
ACM Transactions on Internet Technology (TOIT)
Probabilistic models for focused web crawling
Proceedings of the 6th annual ACM international workshop on Web information and data management
Sampling search-engine results
WWW '05 Proceedings of the 14th international conference on World Wide Web
Learning to crawl: Comparing classification schemes
ACM Transactions on Information Systems (TOIS)
To search or to crawl?: towards a query optimizer for text-centric tasks
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Sampling Search-Engine Results
World Wide Web
Proceedings of the VLDB Endowment
Crawling and Extracting Process Data from the Web
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
A Prototype Process-Based Search Engine
ICSC '09 Proceedings of the 2009 IEEE International Conference on Semantic Computing
Hi-index | 0.00 |
In this paper, we investigate the feasibility of discovering topical resources by combining Web searches and local site-searches. Existing techniques of topical resource discovery consist of crawling the Web and searching the Web. The former typically analyses linkage among Web pages to estimate the relevance of an unseen document to a topic. The latter exploits the indices of generic search engines to discover documents relevant to a topic. Although the local site-search has been a simple and convenient feature of a Web site for human users to quickly locate desired information within the site that hosts tremendous number of documents, this feature has been ignored by the techniques of automatic topical resource discovery. A typical local site-search returns a list of titles, hyperlinks, and snippets of relevant documents that can be used to estimate the relevance of the documents to the topic before actually fetching the documents. We propose an operational model to make use of this simple feature, and address how this model can be realized. Experiments have shown that this simple but efficient approach can provide much more precise estimations than a sophisticated intelligent topical crawler.