A statistical approach for efficient crawling of rich internet applications

Authors:
Mustafa Emre Dincturk;Suryakant Choudhary;Gregor von Bochmann;Guy-Vincent Jourdan;Iosif Viorel Onut
Affiliations:
EECS, University of Ottawa, Ottawa, ON, Canada,IBM Canada CAS Research, Canada;EECS, University of Ottawa, Ottawa, ON, Canada,IBM Canada CAS Research, Canada;EECS, University of Ottawa, Ottawa, ON, Canada,IBM Canada CAS Research, Canada;EECS, University of Ottawa, Ottawa, ON, Canada,IBM Canada CAS Research, Canada;Research and Development, IBM® Security AppScan® Enterprise, IBM, Ottawa, ON, Canada,IBM Canada CAS Research, Canada
Venue:
ICWE'12 Proceedings of the 12th international conference on Web Engineering
Year:
2012

Citing 4
Cited 4

Exact solution of large-scale, asymmetric traveling salesman problems

ACM Transactions on Mathematical Software (TOMS)
Some Modeling Challenges When Testing Rich Internet Applications for Security

ICSTW '10 Proceedings of the 2010 Third International Conference on Software Testing, Verification, and Validation Workshops
State of the Art: Automated Black-Box Web Application Vulnerability Testing

SP '10 Proceedings of the 2010 IEEE Symposium on Security and Privacy
A strategy for efficient crawling of rich internet applications

ICWE'11 Proceedings of the 11th international conference on Web engineering

Crawling rich internet applications: the state of the art

CASCON '12 Proceedings of the 2012 Conference of the Center for Advanced Studies on Collaborative Research
Web object identification for web automation and meta-search

Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics
Building rich internet applications models: example of a better strategy

ICWE'13 Proceedings of the 13th international conference on Web Engineering
A brief history of web crawlers

CASCON '13 Proceedings of the 2013 Conference of the Center for Advanced Studies on Collaborative Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern web technologies, like AJAX result in more responsive and usable web applications, sometimes called Rich Internet Applications (RIAs). Traditional crawling techniques are not sufficient for crawling RIAs. We present a new strategy for crawling RIAs. This new strategy is designed based on the concept of "Model-Based Crawling" introduced in [3] and uses statistics accumulated during the crawl to select what to explore next with a high probability of uncovering some new information. The performance of our strategy is compared with our previous strategy, as well as the classical Breadth-First and Depth-First on two real RIAs and two test RIAs. The results show this new strategy is significantly better than the Breadth-First and the Depth-First strategies (which are widely used to crawl RIAs), and outperforms our previous strategy while being much simpler to implement.