An investigation of documents from the World Wide Web
Proceedings of the fifth international World Wide Web conference on Computer networks and ISDN systems
Change detection in hierarchically structured information
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Meaningful change detection in structured data
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
On-line new event detection and tracking
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Synchronizing a database to improve freshness
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
WebCQ-detecting and delivering information changes on the web
Proceedings of the ninth international conference on Information and knowledge management
Information Monitoring on the Web: A Scalable Solution
World Wide Web
An Automated Change Detection Algorithm for HTML Documents Based on Semantic Hierarchies
Proceedings of the 17th International Conference on Data Engineering
Efficient and effective web change detection
Data & Knowledge Engineering
Detecting Changes in XML Documents
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
A large-scale study of the evolution of web pages
Software—Practice & Experience - Special issue: Web technologies
CX-DIFF: a change detection algorithm for XML content and change visualization for WebVigiL
Data & Knowledge Engineering - Special issue: XML schema and data management
Estimation of internet file-access/modification rates from indirect data
ACM Transactions on Modeling and Computer Simulation (TOMACS)
The portrait of a common HTML web page
Proceedings of the 2006 ACM symposium on Document engineering
An Efficient Web Page Change Detection System Based on an Optimized Hungarian Algorithm
IEEE Transactions on Knowledge and Data Engineering
Using visual pages analysis for optimizing web archiving
Proceedings of the 2010 EDBT/ICDT Workshops
Intelligent and adaptive crawling of web applications for web archiving
ICWE'13 Proceedings of the 13th international conference on Web Engineering
Hi-index | 0.00 |
This paper describes a fast HTML web page detection approach that saves computation time by limiting the similarity computations between two versions of a web page to nodes having the same HTML tag type, and by hashing the web page in order to provide direct access to node information. This efficient approach is suitable as a client application and for implementing server applications that could serve the needs of users in monitoring modifications to HTML web pages made over time, and that allow for reporting and visualizing changes and trends in order to gain insight about the significance and types of such changes. The detection of changes across two versions of a page is accomplished by performing similarity computations after transforming the web page into an XML-like structure in which a node corresponds to an open-close HTML tag. Performance and detection reliability results were obtained, and showed speed improvements when compared to the results of a previous approach.