Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Hi-index | 0.00 |
'Covert crawling' withal designated 'Stealth Crawling' is the invisible manifestation of the existing overt Spider web crawling technology. By virtue of this paper, we present an assortment of diverse hypothesis into a technique for forking out the chronicle security against the above. The obstacles for a covert crawler are primarily to simulate the web browser and its various communicating packets. Also staging, as a human and throttling issues should be obliged by the crawler. We describe the problem of crawling the web in detail and then provide a detailed list of techniques which can be used to differentiate among a web robot and a human. We then formulate an algorithm involving four different tests to perform in order to deal with the problem. On the basis of the algorithm we performe the mentioned tests on six widly known web crawlers and show that the algorithm is able to detect whether it is a human or a bot with the efficiency of 83% which is quite high.