Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Performance limitations of the Java core libraries
JAVA '99 Proceedings of the ACM 1999 conference on Java Grande
Synchronizing a database to improve freshness
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Breadth-first crawling yields high-quality pages
Proceedings of the 10th international conference on World Wide Web
Proceedings of the 11th international conference on World Wide Web
Mercator: A scalable, extensible Web crawler
World Wide Web
The Evolution of the Web and Implications for an Incremental Crawler
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Focused Crawling Using Context Graphs
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Proceedings of the 27th International Conference on Very Large Data Bases
Compressing the Graph Structure of the Web
DCC '01 Proceedings of the Data Compression Conference
Design and Implementation of a High-Performance Distributed Web Crawler
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Estimating the Change of Web Pages
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part III: ICCS 2007
Graph structure of the Korea web
DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
An empirical study on the change of web pages
APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
ICCSA'05 Proceedings of the 2005 international conference on Computational Science and Its Applications - Volume Part II
Reliable evaluations of URL normalization
ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part V
How to evaluate the effectiveness of URL normalizations
HSI'05 Proceedings of the 3rd international conference on Human Society@Internet: web and Communication Technologies and Internet-Related Social Issues
Hi-index | 0.00 |
A web robot is a program that downloads and stores web pages. Implementation issues of web robots have been studied widely and various web statistics are reported in the literature. First, this paper describes the overall architecture of our robot and the implementation decisions on several important issues. Second, we show empirical statistics on approximately 73 million Korean web pages. We also identify what factors of web pages could affect the page changes. The factors may be used for the selection of web pages to be updated incrementally.