Change Detection in Web Pages

Authors:
Divakar Yadav;A. K. Sharma;J. P. Gupta
Affiliations:
-;-;-
Venue:
ICIT '07 Proceedings of the 10th International Conference on Information Technology
Year:
2007

Citing 0
Cited 2

Parallel crawler architecture and web page change detection

WSEAS Transactions on Computers
Topical web crawling using weighted anchor text and web page change detection techniques

WSEAS Transactions on Information Science and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

A large amount of new information is posted on the Web every day. We can take the example of the news portals, which keep on changing not only each and every day but also within each and every hour. Now, which information or data is of how much importance depends upon the perception of the specific user. The Internet and the World Wide Web have enabled a publishing explosion of useful online information, which has produced the unfortunate side effect of information overload: it is increasingly difficult for individuals to keep abreast of fresh information. In this paper, we describe an approach for building a system for efficiently monitoring changes to Web documents. We discuss the mechanism that our proposed algorithm uses to discover and detect changes to the Web pages efficiently. Our solution for searching new information from the web page by tracking the changes in web document's structure has been discussed. In the methodology section, we present the algorithm and technique useful for detecting web pages that are changed, extracting changes from different versions of a web page, and evaluating the significance of web changes. Our algorithm for extracting web changes consists of three steps: document tree construction, document tree encoding and tree matching (based upon the concept of R.M.S. value of the content), for the detection of two types of changes basically -- structural changes and content changes. It has linear time complexity and extracts effectively the changed content from different versions of a web page.