Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
The shark-search algorithm. An application: tailored Web site mapping
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Topic Distillation and Spectral Filtering
Artificial Intelligence Review - Special issue on data mining on the Internet
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Adaptive Retrieval Agents: Internalizing Local Contextand Scaling up to the Web
Machine Learning - Special issue on information retrieval
Intelligent crawling on the World Wide Web with arbitrary predicates
Proceedings of the 10th international conference on World Wide Web
Modern Information Retrieval
Hyperlink Analysis for the Web
IEEE Internet Computing
SBIA '96 Proceedings of the 13th Brazilian Symposium on Artificial Intelligence: Advances in Artificial Intelligence
Focused Crawling Using Context Graphs
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Proceedings of the 13th international conference on World Wide Web
Topical web crawlers: Evaluating adaptive algorithms
ACM Transactions on Internet Technology (TOIT)
A note on the utility of incremental learning
AI Communications
Grid resource discovery mechanism based on resource clustering
ICAIT '08 Proceedings of the 2008 International Conference on Advanced Infocomm Technology
SCTWC: An online semi-supervised clustering approach to topical web crawlers
Applied Soft Computing
Hi-index | 0.00 |
The key issue concerning with Topic-driven Web resource discovery is how to increase the harvest rate, and the crawler should learn from the crawled online information such as the Web pages and the hyperlink structure. We address this problem by endowing a crawler with an incremental learning ability, and propose an online incremental leaning algorithm (IncL). IncL can effectively utilize the multi-feature characteristics of Web pages to enhance their link evaluation accuracy and reliability. We take into account not only a hyperlink's positive source pages but also its negative source pages in its score that is used to rank the Web pages. Many current crawling approaches ignore the negative pages' effect on the page ranking. Experiments show IncL gets high harvest rate.