Extracting URLs from JavaScript via program analysis

  • Authors:
  • Qi Wang;Jingyu Zhou;Yuting Chen;Yizhou Zhang;Jianjun Zhao

  • Affiliations:
  • Shanghai Jiao Tong University, China;Shanghai Jiao Tong University, China;Shanghai Jiao Tong University, China;Shanghai Jiao Tong University, China / Cornell University, USA;Shanghai Jiao Tong University, China

  • Venue:
  • Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the extensive use of client-side JavaScript in web applications, web contents are becoming more dynamic than ever before. This poses significant challenges for search engines, because more web URLs are now embedded or hidden inside JavaScript code and most web crawlers are script-agnostic, significantly reducing the coverage of search engines. We present a hybrid approach that combines static analysis with dynamic execution, overcoming the weakness of a purely static or dynamic approach that either lacks accuracy or suffers from huge execution cost. We also propose to integrate program analysis techniques such as statement coverage and program slicing to improve the performance of URL mining.