Information retrieval in the World-Wide Web: making client-based searching feasible
Selected papers of the first conference on World-Wide Web
The nature of statistical learning theory
The nature of statistical learning theory
Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
Automatic resource compilation by analyzing hyperlink structure and associated text
WWW7 Proceedings of the seventh international conference on World Wide Web 7
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
The shark-search algorithm. An application: tailored Web site mapping
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Fast training of support vector machines using sequential minimal optimization
Advances in kernel methods
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Adaptive Retrieval Agents: Internalizing Local Contextand Scaling up to the Web
Machine Learning - Special issue on information retrieval
Intelligent crawling on the World Wide Web with arbitrary predicates
Proceedings of the 10th international conference on World Wide Web
Evaluating topic-driven web crawlers
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Effective site finding using link anchor information
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Accelerated focused crawling through online relevance feedback
Proceedings of the 11th international conference on World Wide Web
Using web structure for classifying and describing web pages
Proceedings of the 11th international conference on World Wide Web
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Automating the Construction of Internet Portals with Machine Learning
Information Retrieval
MySpiders: Evolve Your Own Intelligent Web Crawlers
Autonomous Agents and Multi-Agent Systems
CI Spider: a tool for competitive intelligence on the web
Decision Support Systems
Focused Crawling Using Context Graphs
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Collaborative crawling: mining user experiences for topical resource discovery
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Deriving link-context from HTML tag tree
DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
ScentTrails: Integrating browsing and searching on the Web
ACM Transactions on Computer-Human Interaction (TOCHI)
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Panorama: extending digital libraries with topical crawlers
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
A General Evaluation Framework for Topical Crawlers
Information Retrieval
Learning to crawl: Comparing classification schemes
ACM Transactions on Information Systems (TOIS)
The SMART Retrieval System—Experiments in Automatic Document Processing
The SMART Retrieval System—Experiments in Automatic Document Processing
An automatic approach to construct domain-specific web portals
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
The impact of term selection in genre-aware focused crawling
Proceedings of the 2008 ACM symposium on Applied computing
Guide focused crawler efficiently and effectively using on-line topical importance estimation
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Exploiting Multiple Features with MEMMs for Focused Web Crawling
NLDB '08 Proceedings of the 13th international conference on Natural Language and Information Systems: Applications of Natural Language to Information Systems
Quality Information Retrieval for the World Wide Web
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Contextualized Recommendation Based on Reality Mining From Mobile Subscribers
Cybernetics and Systems
Topical web crawling using weighted anchor text and web page change detection techniques
WSEAS Transactions on Information Science and Applications
A framework to derive web page context from hyperlink structure
International Journal of Information and Communication Technology
Expert Systems with Applications: An International Journal
Improving the performance of focused web crawlers
Data & Knowledge Engineering
A Genre-Aware Approach to Focused Crawling
World Wide Web
SCTWC: An online semi-supervised clustering approach to topical web crawlers
Applied Soft Computing
Adaptive focused crawler based on tunneling and link analysis
ICACT'09 Proceedings of the 11th international conference on Advanced Communication Technology - Volume 3
Foundations and Trends in Information Retrieval
Exploiting genre in focused crawling
SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
The research and implementation of the deep search engine of popular science
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7
Where to crawl next for focused crawlers
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part IV
A conceptual framework for efficient web crawling in virtual integration contexts
WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
A tool for link-based web page classification
CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence
FDIA'09 Proceedings of the Third BCS-IRSG conference on Future Directions in Information Access
PROBABILISTIC MODELS FOR FOCUSED WEB CRAWLING
Computational Intelligence
Turn the page: automated traversal of paginated websites
ICWE'12 Proceedings of the 12th international conference on Web Engineering
An analyst-adaptive approach to focused crawlers
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Editorial: A topic-specific crawling strategy based on semantics similarity
Data & Knowledge Engineering
Hi-index | 0.00 |
Context of a hyperlink or link context is defined as the terms that appear in the text around a hyperlink within a Web page. Link contexts have been applied to a variety of Web information retrieval and categorization tasks. Topical or focused Web crawlers have a special reliance on link contexts. These crawlers automatically navigate the hyperlinked structure of the Web while using link contexts to predict the benefit of following the corresponding hyperlinks with respect to some initiating topic or theme. Using topical crawlers that are guided by a Support Vector Machine, we investigate the effects of various definitions of link contexts on the crawling performance. We find that a crawler that exploits words both in the immediate vicinity of a hyperlink as well as the entire parent page performs significantly better than a crawler that depends on just one of those cues. Also, we find that a crawler that uses the tag tree hierarchy within Web pages provides effective coverage. We analyze our results along various dimensions such as link context quality, topic difficulty, length of crawl, training data, and topic domain. The study was done using multiple crawls over 100 topics covering millions of pages allowing us to derive statistically strong results.