Crawling rich internet applications: the state of the art

Authors:
Suryakant Choudhary;Mustafa Emre Dincturk;Seyed M. Mirtaheri;Ali Moosavi;Gregor von Bochmann;Guy-Vincent Jourdan;Iosif Viorel Onut
Affiliations:
EECS, University of Ottawa, Ottawa, Ontario, Canada;EECS, University of Ottawa, Ottawa, Ontario, Canada;EECS, University of Ottawa, Ottawa, Ontario, Canada;EECS, University of Ottawa, Ottawa, Ontario, Canada;EECS, University of Ottawa, Ottawa, Ontario, Canada and Fellow of IBM Canada CAS Research, Markham, Ontario, Canada;EECS, University of Ottawa, Ottawa, Ontario, Canada and Fellow of IBM Canada CAS Research, Markham, Ontario, Canada;R&D, IBM^®® Security AppScan^® Enterprise, Ottawa, Ontario, Canada and IBM Canada Software Lab, Canada
Venue:
CASCON '12 Proceedings of the 2012 Conference of the Center for Advanced Studies on Collaborative Research
Year:
2012

Citing 25
Cited 2

Exact solution of large-scale, asymmetric traveling salesman problems

ACM Transactions on Mathematical Software (TOMS)
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Parallel crawlers

Proceedings of the 11th international conference on World Wide Web
UbiCrawler: a scalable fully distributed web crawler

Software—Practice & Experience
Parallel crawling for online social networks

Proceedings of the 16th international conference on World Wide Web
State-Based Testing of Ajax Web Applications

ICST '08 Proceedings of the 2008 International Conference on Software Testing, Verification, and Validation
Crawling AJAX by Inferring User Interface State Changes

ICWE '08 Proceedings of the 2008 Eighth International Conference on Web Engineering
Reverse Engineering Finite State Machines from Rich Internet Applications

WCRE '08 Proceedings of the 2008 15th Working Conference on Reverse Engineering
AJAXSearch: crawling, indexing and searching web 2.0 applications

Proceedings of the VLDB Endowment
Efficient Partitioning Strategies for Distributed Web Crawling

Information Networking. Towards Ubiquitous Networking and Services
AJAX Crawl: Making AJAX Applications Searchable

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Invariant-based automatic testing of AJAX user interfaces

ICSE '09 Proceedings of the 31st International Conference on Software Engineering
Automated security testing of web widget interactions

Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Web Crawling

Foundations and Trends in Information Retrieval
Rich Internet Application Testing Using Execution Trace Data

ICSTW '10 Proceedings of the 2010 Third International Conference on Software Testing, Verification, and Validation Workshops
Some Modeling Challenges When Testing Rich Internet Applications for Security

ICSTW '10 Proceedings of the 2010 Third International Conference on Software Testing, Verification, and Validation Workshops
Regression Testing Ajax Applications: Coping with Dynamism

ICST '10 Proceedings of the 2010 Third International Conference on Software Testing, Verification and Validation
An Iterative Approach for the Reverse Engineering of Rich Internet Application User Interfaces

ICIW '10 Proceedings of the 2010 Fifth International Conference on Internet and Web Applications and Services
DynaRIA: A Tool for Ajax Web Application Comprehension

ICPC '10 Proceedings of the 2010 IEEE 18th International Conference on Program Comprehension
State of the Art: Automated Black-Box Web Application Vulnerability Testing

SP '10 Proceedings of the 2010 IEEE Symposium on Security and Privacy
A strategy for efficient crawling of rich internet applications

ICWE'11 Proceedings of the 11th international conference on Web engineering
Crawling Ajax-Based Web Applications through Dynamic Analysis of User Interface State Changes

ACM Transactions on the Web (TWEB)
Graph-Based AJAX Crawl: Mining Data from Rich Internet Applications

ICCSEE '12 Proceedings of the 2012 International Conference on Computer Science and Electronics Engineering - Volume 03
Solving Some Modeling Challenges when Testing Rich Internet Applications for Security

ICST '12 Proceedings of the 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation
A statistical approach for efficient crawling of rich internet applications

ICWE'12 Proceedings of the 12th international conference on Web Engineering

Building rich internet applications models: example of a better strategy

ICWE'13 Proceedings of the 13th international conference on Web Engineering
A brief history of web crawlers

CASCON '13 Proceedings of the 2013 Conference of the Center for Advanced Studies on Collaborative Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Web applications have come a long way, both in terms of adoption to provide information and services, and in terms of the technologies to develop them. With the emergence of richer and more advanced technologies such as AJAX, web applications have become more interactive, responsive and user friendly. These applications, often called Rich Internet Applications (RIAs), changed the web applications in two ways: (1) dynamic manipulation of client-side state and (2) asynchronous communication with the server. However, at the same time, such techniques also introduced new challenges. One important challenge is the difficulty of automatically crawling these new applications. Without crawling, RIAs cannot be indexed nor tested automatically. Traditional crawlers are not able to handle these newer technologies. This paper surveys the research on addressing the problem of crawling RIAs and provides some experimental results to compare existing crawling strategies. In addition, we provide some future directions for research on crawling RIAs.