Tracking the changes of dynamic web pages in the existence of URL rewriting

  • Authors:
  • Ping-Jer Yeh;Jie-Tsung Li;Shyan-Ming Yuan

  • Affiliations:
  • National Chiao Tung University, Hsinchu, Taiwan;National Chiao Tung University, Hsinchu, Taiwan;National Chiao Tung University, Hsinchu, Taiwan

  • Venue:
  • AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Crawlers in a knowledge management system need to collect and archive documents from websites, and also track the change status of these documents. However, the existence of URL rewriting mechanism raises a page tracking problem since the URLs of a pair of dynamic page instances obtained during different sessions will no longer be the same. This paper proposes a series of algorithms in a bottom-up manner to find the corresponding pairs of dynamic page instances, and then to judge the change status of them. Experiments showed that the performance was very good and the outcome was 100% accurate.