Information retrieval in the World-Wide Web: making client-based searching feasible
Selected papers of the first conference on World-Wide Web
The shark-search algorithm. An application: tailored Web site mapping
WWW7 Proceedings of the seventh international conference on World Wide Web 7
A technique for measuring the relative size and overlap of public Web search engines
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
WTMS: a system for collecting for collecting and analyzing topic-specific Web information
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Complementing search engines with online web mining agents
Decision Support Systems - Special issue: Web data mining
Learnable topic-specific web crawler
Journal of Network and Computer Applications - Special issue on computational intelligence on the internet
Tensor Framework and Combined Symmetry for Hypertext Mining
Fundamenta Informaticae
A novel split and merge technique for hypertext classification
Transactions on rough sets XII
Tensor Framework and Combined Symmetry for Hypertext Mining
Fundamenta Informaticae
Hi-index | 0.00 |
The rapid growth of the World-Wide Web poses unprecedented scaling challenges for general-purpose crawlers. Topic-specific web crawler is developed to collect relevant web pages of interested topics form the Internet. Based on the analyses of HITS algorithm, a new P-HITS algorithm is proposed for topic-specific web crawler in this paper. Probability is introduced to select the URLs to get more global optimality, and the metadata of hyperlinks is appended in this algorithm to predict the relevance of web pages better. Experimental results indicate that our algorithm has better performance.