Information retrieval in the World-Wide Web: making client-based searching feasible
Selected papers of the first conference on World-Wide Web
Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
The shark-search algorithm. An application: tailored Web site mapping
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Intelligent crawling on the World Wide Web with arbitrary predicates
Proceedings of the 10th international conference on World Wide Web
Evaluating topic-driven web crawlers
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Accelerated focused crawling through online relevance feedback
Proceedings of the 11th international conference on World Wide Web
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Using Reinforcement Learning to Spider the Web Efficiently
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
X-means: Extending K-means with Efficient Estimation of the Number of Clusters
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Focused Crawling Using Context Graphs
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Focused Crawls, Tunneling, and Digital Libraries
ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
Table extraction using conditional random fields
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
SVDPACKC (Version 1.0) User''s Guide
SVDPACKC (Version 1.0) User''s Guide
Panorama: extending digital libraries with topical crawlers
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Topical web crawlers: Evaluating adaptive algorithms
ACM Transactions on Internet Technology (TOIT)
A General Evaluation Framework for Topical Crawlers
Information Retrieval
Exploiting Multiple Features with MEMMs for Focused Web Crawling
NLDB '08 Proceedings of the 13th international conference on Natural Language and Information Systems: Applications of Natural Language to Information Systems
An ontology-based approach to learnable focused crawling
Information Sciences: an International Journal
Improving the performance of focused web crawlers
Data & Knowledge Engineering
A Genre-Aware Approach to Focused Crawling
World Wide Web
Proceedings of the 27th ACM international conference on Design of communication
SCTWC: An online semi-supervised clustering approach to topical web crawlers
Applied Soft Computing
Exploiting genre in focused crawling
SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Automated browsing in AJAX websites
Data & Knowledge Engineering
A constrained crawling approach and its application to a specialised search engine
International Journal of Information and Communication Technology
PROBABILISTIC MODELS FOR FOCUSED WEB CRAWLING
Computational Intelligence
Turn the page: automated traversal of paginated websites
ICWE'12 Proceedings of the 12th international conference on Web Engineering
Semantic ranking of web pages based on formal concept analysis
Journal of Systems and Software
Editorial: A topic-specific crawling strategy based on semantics similarity
Data & Knowledge Engineering
A synergistic approach to efficient web searching
Intelligent Decision Technologies
Hi-index | 0.00 |
A focused crawler is designed to traverse the Web to gather documents on a specific topic. It can be used to build domain-specific Web search portals and online personalized search tools. To estimate the relevance of a newly seen URL, it must use information gleaned from previously crawled page sequences.In this paper, we present a new approach for prediction of the links leading to relevant pages based on a Hidden Markov Model (HMM). The system consists of three stages: user data collection, user modelling via sequential pattern learning, and focused crawling. In particular, we first collect the Web pages visited during a user browsing session. These pages are clustered, and the link structure among pages from different clusters is then used to learn page sequences that are likely to lead to target pages. The learning is performed using HMM. During crawling, the priority of links to follow is based on a learned estimate of how likely the page is to lead to a target page. We compare the performance with Context-Graph crawling and Best-First crawling. Our experiments demonstrate that this approach performs better than Context-Graph crawling and Best-First crawling.