Citing 30
Cited 24

Information retrieval in the World-Wide Web: making client-based searching feasible

Selected papers of the first conference on World-Wide Web
The nature of statistical learning theory

The nature of statistical learning theory
Autonomous interface agents

Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
Automatic resource compilation by analyzing hyperlink structure and associated text

WWW7 Proceedings of the seventh international conference on World Wide Web 7
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
The shark-search algorithm. An application: tailored Web site mapping

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Focused crawling: a new approach to topic-specific Web resource discovery

WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Topical locality in the Web

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Adaptive Retrieval Agents: Internalizing Local Contextand Scaling up to the Web

Machine Learning - Special issue on information retrieval
Intelligent crawling on the World Wide Web with arbitrary predicates

Proceedings of the 10th international conference on World Wide Web
Evaluating topic-driven web crawlers

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Effective site finding using link anchor information

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Accelerated focused crawling through online relevance feedback

Proceedings of the 11th international conference on World Wide Web
Using web structure for classifying and describing web pages

Proceedings of the 11th international conference on World Wide Web
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Automating the Construction of Internet Portals with Machine Learning

Information Retrieval
MySpiders: Evolve Your Own Intelligent Web Crawlers

Autonomous Agents and Multi-Agent Systems
CI Spider: a tool for competitive intelligence on the web

Decision Support Systems
Focused Crawling Using Context Graphs

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Collaborative crawling: mining user experiences for topical resource discovery

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Deriving link-context from HTML tag tree

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
ScentTrails: Integrating browsing and searching on the Web

ACM Transactions on Computer-Human Interaction (TOCHI)
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Building domain-specific web collections for scientific digital libraries: a meta-search enhanced focused crawling method

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Panorama: extending digital libraries with topical crawlers

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
A General Evaluation Framework for Topical Crawlers

Information Retrieval
Learning to crawl: Comparing classification schemes

ACM Transactions on Information Systems (TOIS)
The SMART Retrieval System—Experiments in Automatic Document Processing

The SMART Retrieval System—Experiments in Automatic Document Processing

An automatic approach to construct domain-specific web portals

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
The impact of term selection in genre-aware focused crawling

Proceedings of the 2008 ACM symposium on Applied computing
Guide focused crawler efficiently and effectively using on-line topical importance estimation

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Exploiting Multiple Features with MEMMs for Focused Web Crawling

NLDB '08 Proceedings of the 13th international conference on Natural Language and Information Systems: Applications of Natural Language to Information Systems
Quality Information Retrieval for the World Wide Web

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Contextualized Recommendation Based on Reality Mining From Mobile Subscribers

Cybernetics and Systems
Topical web crawling using weighted anchor text and web page change detection techniques

WSEAS Transactions on Information Science and Applications
A framework to derive web page context from hyperlink structure

International Journal of Information and Communication Technology
Contextualized mobile recommendation service based on interactive social network discovered from mobile users

Expert Systems with Applications: An International Journal
Improving the performance of focused web crawlers

Data & Knowledge Engineering
A Genre-Aware Approach to Focused Crawling

World Wide Web
SCTWC: An online semi-supervised clustering approach to topical web crawlers

Applied Soft Computing
Adaptive focused crawler based on tunneling and link analysis

ICACT'09 Proceedings of the 11th international conference on Advanced Communication Technology - Volume 3
Web Crawling

Foundations and Trends in Information Retrieval
Exploiting genre in focused crawling

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
The research and implementation of the deep search engine of popular science

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7
Where to crawl next for focused crawlers

KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part IV
A conceptual framework for efficient web crawling in virtual integration contexts

WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
A tool for link-based web page classification

CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence
Intelligent web navigation

FDIA'09 Proceedings of the Third BCS-IRSG conference on Future Directions in Information Access
PROBABILISTIC MODELS FOR FOCUSED WEB CRAWLING

Computational Intelligence
Turn the page: automated traversal of paginated websites

ICWE'12 Proceedings of the 12th international conference on Web Engineering
An analyst-adaptive approach to focused crawlers

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Editorial: A topic-specific crawling strategy based on semantics similarity

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Context of a hyperlink or link context is defined as the terms that appear in the text around a hyperlink within a Web page. Link contexts have been applied to a variety of Web information retrieval and categorization tasks. Topical or focused Web crawlers have a special reliance on link contexts. These crawlers automatically navigate the hyperlinked structure of the Web while using link contexts to predict the benefit of following the corresponding hyperlinks with respect to some initiating topic or theme. Using topical crawlers that are guided by a Support Vector Machine, we investigate the effects of various definitions of link contexts on the crawling performance. We find that a crawler that exploits words both in the immediate vicinity of a hyperlink as well as the entire parent page performs significantly better than a crawler that depends on just one of those cues. Also, we find that a crawler that uses the tag tree hierarchy within Web pages provides effective coverage. We analyze our results along various dimensions such as link context quality, topic difficulty, length of crawl, training data, and topic domain. The study was done using multiple crawls over 100 topics covering millions of pages allowing us to derive statistically strong results.

Link Contexts in Classifier-Guided Topical Crawlers

Quantified Score

Visualization

Abstract