A brief history of web crawlers

Authors:
Seyed M. Mirtaheri;Mustafa Emre Dinçtürk;Salman Hooshmand;Gregor V. Bochmann;Guy-Vincent Jourdan;Iosif Viorel Onut
Affiliations:
University of Ottawa, Ottawa, Ontario, Canada;University of Ottawa, Ottawa, Ontario, Canada;University of Ottawa, Ottawa, Ontario, Canada;University of Ottawa, Ottawa, Ontario, Canada;University of Ottawa, Ottawa, Ontario, Canada;Security AppScan® Enterprise, IBM, Ottawa, Ontario, Canada
Venue:
CASCON '13 Proceedings of the 2013 Conference of the Center for Advanced Studies on Collaborative Research
Year:
2013

Citing 32
Cited 0

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Mercator: A scalable, extensible Web crawler

World Wide Web
Crawling the Hidden Web

Proceedings of the 27th International Conference on Very Large Data Bases
Document Object Model

Document Object Model
Design and Implementation of a High-Performance Distributed Web Crawler

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Downloading textual hidden web content through keyword queries

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Geographical partition for distributed web crawling

Proceedings of the 2005 workshop on Geographic information retrieval
Accessing the deep web

Communications of the ACM - ACM at sixty: a look back in time
State-Based Testing of Ajax Web Applications

ICST '08 Proceedings of the 2008 International Conference on Software Testing, Verification, and Validation
Crawling AJAX by Inferring User Interface State Changes

ICWE '08 Proceedings of the 2008 Eighth International Conference on Web Engineering
Reverse Engineering Finite State Machines from Rich Internet Applications

WCRE '08 Proceedings of the 2008 15th Working Conference on Reverse Engineering
An Approach to Deep Web Crawling by Sampling

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
AJAX Crawl: Making AJAX Applications Searchable

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Invariant-based automatic testing of AJAX user interfaces

ICSE '09 Proceedings of the 31st International Conference on Software Engineering
Search-Based Testing of Ajax Web Applications

SSBSE '09 Proceedings of the 2009 1st International Symposium on Search Based Software Engineering
Introduction to Algorithms, Third Edition

Introduction to Algorithms, Third Edition
Web Crawling

Foundations and Trends in Information Retrieval
Rich Internet Application Testing Using Execution Trace Data

ICSTW '10 Proceedings of the 2010 Third International Conference on Software Testing, Verification, and Validation Workshops
Regression Testing Ajax Applications: Coping with Dynamism

ICST '10 Proceedings of the 2010 Third International Conference on Software Testing, Verification and Validation
State of the Art: Automated Black-Box Web Application Vulnerability Testing

SP '10 Proceedings of the 2010 IEEE Symposium on Security and Privacy
Why Johnny can't pentest: an analysis of black-box web vulnerability scanners

DIMVA'10 Proceedings of the 7th international conference on Detection of intrusions and malware, and vulnerability assessment
Online graph exploration: new results on old and new algorithms

ICALP'11 Proceedings of the 38th international conference on Automata, languages and programming - Volume Part II
A strategy for efficient crawling of rich internet applications

ICWE'11 Proceedings of the 11th international conference on Web engineering
Crawling Ajax-Based Web Applications through Dynamic Analysis of User Interface State Changes

ACM Transactions on the Web (TWEB)
Graph-Based AJAX Crawl: Mining Data from Rich Internet Applications

ICCSEE '12 Proceedings of the 2012 International Conference on Computer Science and Electronics Engineering - Volume 03
Solving Some Modeling Challenges when Testing Rich Internet Applications for Security

ICST '12 Proceedings of the 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation
A statistical approach for efficient crawling of rich internet applications

ICWE'12 Proceedings of the 12th international conference on Web Engineering
Online graph exploration with advice

SIROCCO'12 Proceedings of the 19th international conference on Structural Information and Communication Complexity
Crawling rich internet applications: the state of the art

CASCON '12 Proceedings of the 2012 Conference of the Center for Advanced Studies on Collaborative Research
Imagen: runtime migration of browser sessions for javascript web applications

Proceedings of the 22nd international conference on World Wide Web
Hidden-Web induced by client-side scripting: an empirical study

ICWE'13 Proceedings of the 13th international conference on Web Engineering
Building rich internet applications models: example of a better strategy

ICWE'13 Proceedings of the 13th international conference on Web Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Web crawlers have a long and interesting history. Early web crawlers collected statistics about the web. In addition to collecting statistics about the web and indexing the applications for search engines, modern crawlers can be used to perform accessibility and vulnerability checks on the application. Quick expansion of the web, and the complexity added to web applications have made the process of crawling a very challenging one. Throughout the history of web crawling many researchers and industrial groups addressed different issues and challenges that web crawlers face. Different solutions have been proposed to reduce the time and cost of crawling. Performing an exhaustive crawl is a challenging question. Additionally, capturing the model of a modern web application and extracting data from it automatically is another open question. What follows is a brief history of different techniques and algorithms used from the early days of crawling up to the recent days. We introduce criteria to evaluate the relative performance and objective of web crawlers. Based on these criteria we plot the evolution of web crawlers.