Focused Crawls, Tunneling, and Digital Libraries

Authors:
Donna Bergmark;Carl Lagoze;Alex Sbityakov
Affiliations:
-;-;-
Venue:
ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
Year:
2002

Citing 20
Cited 24

Silk from a sow's ear: extracting usable structures from the Web

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Inferring Web communities from link topology

Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Focused crawling: a new approach to topic-specific Web resource discovery

WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Topical locality in the Web

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Recent results in automatic Web resource discovery

ACM Computing Surveys (CSUR)
WTMS: a system for collecting for collecting and analyzing topic-specific Web information

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Adaptive Retrieval Agents: Internalizing Local Contextand Scaling up to the Web

Machine Learning - Special issue on information retrieval
Compiling document collections from the Internet

ACM SIGIR Forum
Evaluating topic-driven web crawlers

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Core services in the architecture of the national science digital library (NSDL)

Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Collection synthesis

Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Mercator: A scalable, extensible Web crawler

World Wide Web
Using Reinforcement Learning to Spider the Web Efficiently

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Distributed Hypertext Resource Discovery Through Examples

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Focused Crawling Using Context Graphs

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Topic Distillation on Hierarchically Categorized Web Documents

KDEX '99 Proceedings of the 1999 Workshop on Knowledge and Data Engineering Exchange
Using High Performance Systems to Build Collections for a Digital Library

ICPPW '02 Proceedings of the 2002 International Conference on Parallel Processing Workshops
Automatic Information Organization and Retrieval.

Automatic Information Organization and Retrieval.

Ontology-focused crawling of Web documents

Proceedings of the 2003 ACM symposium on Applied computing
Building domain-specific web collections for scientific digital libraries: a meta-search enhanced focused crawling method

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Panorama: extending digital libraries with topical crawlers

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Web-crawling reliability

Journal of the American Society for Information Science and Technology - Special issue: Webometrics
Multilingual Web retrieval: An experiment in English–Chinese business intelligence

Journal of the American Society for Information Science and Technology
Building implicit links from content for forum search

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Using HMM to learn user browsing patterns for focused web crawling

Data & Knowledge Engineering - Special issue: WIDM 2004
Combining text and link analysis for focused crawling-An application for vertical search engines

Information Systems
Agreeing to disagree: search engines and their public interfaces

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Enhancing digital libraries using missing content analysis

Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
Improving the performance of focused web crawlers

Data & Knowledge Engineering
Adaptive geospatially focused crawling

Proceedings of the 18th ACM conference on Information and knowledge management
Developing a holistic model for digital library evaluation

Journal of the American Society for Information Science and Technology
Adaptive focused crawler based on tunneling and link analysis

ICACT'09 Proceedings of the 11th international conference on Advanced Communication Technology - Volume 3
Adaptive focused crawling

The adaptive web
Addressing the limited scope problem of focused crawling using a result merging approach

Proceedings of the 2010 ACM Symposium on Applied Computing
postingRank: bringing order to web forum postings

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
The research and implementation of the deep search engine of popular science

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7
Combining text and link analysis for focused crawling

ICAPR'05 Proceedings of the Third international conference on Advances in Pattern Recognition - Volume Part I
A new method for focused crawler cross tunnel

RSKT'06 Proceedings of the First international conference on Rough Sets and Knowledge Technology
Evaluation of the NSDL and google for obtaining pedagogical resources

ECDL'05 Proceedings of the 9th European conference on Research and Advanced Technology for Digital Libraries
Focused crawling using latent semantic indexing – an application for vertical search engines

ECDL'05 Proceedings of the 9th European conference on Research and Advanced Technology for Digital Libraries
Schema driven and topic specific web crawling

DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
PROBABILISTIC MODELS FOR FOCUSED WEB CRAWLING

Computational Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Crawling the Web to build collections of documents related to pre-specified topics became an active area of research during the late 1990's, crawler technology having been developed for use by search engines. Now, Web crawling is being seriously considered as an important strategy for building large scale digital libraries. This paper covers some of the crawl technologies that might be exploited for collection building. For example, to make such collection-building crawls more effective, focused crawling was developed, in which the goal was to make a "best-first" crawl of the Web. We are using powerful crawler software to implement a focused crawl but use tunneling to overcome some of the limitations of a pure best-first approach. Tunneling has been described by others as not only prioritizing links from pages according to the page's relevance score, but also estimating the value of each link and prioritizing them as well. We add to this mix by devising a tunneling focused crawling strategy which evaluates the current crawl direction on the fly to determine when to terminate a tunneling activity. Results indicate that a combination of focused crawling and tunneling could be an effective tool for building digital libraries.