A statistical approach for efficient crawling of rich internet applications

  • Authors:
  • Mustafa Emre Dincturk;Suryakant Choudhary;Gregor von Bochmann;Guy-Vincent Jourdan;Iosif Viorel Onut

  • Affiliations:
  • EECS, University of Ottawa, Ottawa, ON, Canada,IBM Canada CAS Research, Canada;EECS, University of Ottawa, Ottawa, ON, Canada,IBM Canada CAS Research, Canada;EECS, University of Ottawa, Ottawa, ON, Canada,IBM Canada CAS Research, Canada;EECS, University of Ottawa, Ottawa, ON, Canada,IBM Canada CAS Research, Canada;Research and Development, IBM® Security AppScan® Enterprise, IBM, Ottawa, ON, Canada,IBM Canada CAS Research, Canada

  • Venue:
  • ICWE'12 Proceedings of the 12th international conference on Web Engineering
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Modern web technologies, like AJAX result in more responsive and usable web applications, sometimes called Rich Internet Applications (RIAs). Traditional crawling techniques are not sufficient for crawling RIAs. We present a new strategy for crawling RIAs. This new strategy is designed based on the concept of "Model-Based Crawling" introduced in [3] and uses statistics accumulated during the crawl to select what to explore next with a high probability of uncovering some new information. The performance of our strategy is compared with our previous strategy, as well as the classical Breadth-First and Depth-First on two real RIAs and two test RIAs. The results show this new strategy is significantly better than the Breadth-First and the Depth-First strategies (which are widely used to crawl RIAs), and outperforms our previous strategy while being much simpler to implement.