A strategy for efficient crawling of rich internet applications

Authors:
Kamara Benjamin;Gregor Von Bochmann;Mustafa Emre Dincturk;Guy-Vincent Jourdan;Iosif Viorel Onut
Affiliations:
SITE, University of Ottawa, Ottawa, ON, Canada;SITE, University of Ottawa, Ottawa, ON, Canada;SITE, University of Ottawa, Ottawa, ON, Canada;SITE, University of Ottawa, Ottawa, ON, Canada;Research and Development, IBM, Rational, AppScan Enterprise, IBM, Ottawa, ON, Canada
Venue:
ICWE'11 Proceedings of the 11th international conference on Web engineering
Year:
2011

Citing 11
Cited 5

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Searching the Web

ACM Transactions on Internet Technology (TOIT)
Partitioning the Boolean lattice into chains of large minimum size

Journal of Combinatorial Theory Series A
Detecting near-duplicates for web crawling

Proceedings of the 16th international conference on World Wide Web
State-Based Testing of Ajax Web Applications

ICST '08 Proceedings of the 2008 International Conference on Software Testing, Verification, and Validation
Crawling AJAX by Inferring User Interface State Changes

ICWE '08 Proceedings of the 2008 Eighth International Conference on Web Engineering
AJAXSearch: crawling, indexing and searching web 2.0 applications

Proceedings of the VLDB Endowment
Automated security testing of web widget interactions

Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Some Modeling Challenges When Testing Rich Internet Applications for Security

ICSTW '10 Proceedings of the 2010 Third International Conference on Software Testing, Verification, and Validation Workshops
Regression Testing Ajax Applications: Coping with Dynamism

ICST '10 Proceedings of the 2010 Third International Conference on Software Testing, Verification and Validation
State of the Art: Automated Black-Box Web Application Vulnerability Testing

SP '10 Proceedings of the 2010 IEEE Symposium on Security and Privacy

A finite-state machine approach for modeling and analyzing restful systems

Journal of Web Engineering
A statistical approach for efficient crawling of rich internet applications

ICWE'12 Proceedings of the 12th international conference on Web Engineering
Crawling rich internet applications: the state of the art

CASCON '12 Proceedings of the 2012 Conference of the Center for Advanced Studies on Collaborative Research
Building rich internet applications models: example of a better strategy

ICWE'13 Proceedings of the 13th international conference on Web Engineering
A brief history of web crawlers

CASCON '13 Proceedings of the 2013 Conference of the Center for Advanced Studies on Collaborative Research

Quantified Score

Hi-index	0.01

Visualization

Abstract

New web application development technologies such as Ajax, Flex or Silverlight result in so-called Rich Internet Applications (RIAs) that provide enhanced responsiveness, but introduce new challenges for crawling that cannot be addressed by the traditional crawlers. This paper describes a novel crawling technique for RIAs. The technique first generates an optimal crawling strategy for an anticipated model of the crawled RIA by aiming at discovering new states as quickly as possible. As the strategy is executed, if the discovered portion of the actual model of the application deviates from the anticipated model, the anticipated model and the strategy are updated to conform to the actual model. We compare the performance of our technique to a number of existing ones as well as depth-first and breadth-first crawling on some Ajax test applications. The results show that our technique has a better performance often with a faster rate of state discovery.