An incremental approach to link evaluation in topic-driven web resource discovery

Authors:
Huaxiang Zhang;Shangteng Huang
Affiliations:
Information and Management School, Shandong Normal Univ., Jinan, Shandong, China;Department of Computer Science and Technology, Shanghai Jiaotong Univ., Shanghai, China
Venue:
AAIM'05 Proceedings of the First international conference on Algorithmic Applications in Management
Year:
2005

Citing 15
Cited 2

Efficient crawling through URL ordering

WWW7 Proceedings of the seventh international conference on World Wide Web 7
The shark-search algorithm. An application: tailored Web site mapping

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Focused crawling: a new approach to topic-specific Web resource discovery

WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Topic Distillation and Spectral Filtering

Artificial Intelligence Review - Special issue on data mining on the Internet
Topical locality in the Web

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Adaptive Retrieval Agents: Internalizing Local Contextand Scaling up to the Web

Machine Learning - Special issue on information retrieval
Intelligent crawling on the World Wide Web with arbitrary predicates

Proceedings of the 10th international conference on World Wide Web
Modern Information Retrieval

Modern Information Retrieval
Hyperlink Analysis for the Web

IEEE Internet Computing
Regression by Classification

SBIA '96 Proceedings of the 13th Brazilian Symposium on Artificial Intelligence: Advances in Artificial Intelligence
Focused Crawling Using Context Graphs

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Ranking the web frontier

Proceedings of the 13th international conference on World Wide Web
Topical web crawlers: Evaluating adaptive algorithms

ACM Transactions on Internet Technology (TOIT)
A note on the utility of incremental learning

AI Communications

Grid resource discovery mechanism based on resource clustering

ICAIT '08 Proceedings of the 2008 International Conference on Advanced Infocomm Technology
SCTWC: An online semi-supervised clustering approach to topical web crawlers

Applied Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The key issue concerning with Topic-driven Web resource discovery is how to increase the harvest rate, and the crawler should learn from the crawled online information such as the Web pages and the hyperlink structure. We address this problem by endowing a crawler with an incremental learning ability, and propose an online incremental leaning algorithm (IncL). IncL can effectively utilize the multi-feature characteristics of Web pages to enhance their link evaluation accuracy and reliability. We take into account not only a hyperlink's positive source pages but also its negative source pages in its score that is used to rank the Web pages. Many current crawling approaches ignore the negative pages' effect on the page ranking. Experiments show IncL gets high harvest rate.