Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Improved algorithms for topic distillation in a hyperlinked environment
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Learning to construct knowledge bases from the World Wide Web
Artificial Intelligence - Special issue on Intelligent internet systems
Accelerated focused crawling through online relevance feedback
Proceedings of the 11th international conference on World Wide Web
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Using Reinforcement Learning to Spider the Web Efficiently
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Distributed Hypertext Resource Discovery Through Examples
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Focused Crawling Using Context Graphs
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Web site mining: a new way to spot competitors, customers and suppliers in the world wide web
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Using ODP metadata to personalize search
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
RankMass crawler: a crawler with high personalized pagerank coverage guarantee
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
SESQ: A Model-Driven Method for Building Object Level Vertical Search Engines
ER '08 Proceedings of the 27th International Conference on Conceptual Modeling
Using structured tokens to identify webpages for data extraction
APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
Discover hierarchical subgraphs with network-topology based ranking score
Proceedings of the Third C* Conference on Computer Science and Software Engineering
Where to crawl next for focused crawlers
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part IV
Meta-search based web resource discovery for object-level vertical search
WISE'06 Proceedings of the 7th international conference on Web Information Systems
Topic-based website feature analysis for enterprise search from the web
WISE'06 Proceedings of the 7th international conference on Web Information Systems
E-FFC: an enhanced form-focused crawler for domain-specific deep web databases
Journal of Intelligent Information Systems
Hi-index | 0.00 |
Focused web crawlers have recently emerged as an alternative to the well-established web search engines. While the well-known focused crawlers retrieve relevant webpages, there are various applications which target whole websites instead of single webpages. For example, companies are represented by websites, not by individual webpages. To answer queries targeted at websites, web directories are an established solution. In this paper, we introduce a novel focused website crawler to employ the paradigm of focused crawling for the search of relevant websites. The proposed crawler is based on a two-level architecture and corresponding crawl strategies with an explicit concept of websites. The external crawler views the web as a graph of linked websites, selects the websites to be examined next and invokes internal crawlers. Each internal crawler views the webpages of a single given website and performs focused (page) crawling within that website. Our experimental evaluation demonstrates that the proposed focused website crawler clearly outperforms previous methods of focused crawling which were adapted to retrieve websites instead of single webpages.