NeTra: a toolbox for navigating large image databases
ICIP '97 Proceedings of the 1997 International Conference on Image Processing (ICIP '97) 3-Volume Set-Volume 1 - Volume 1
Learning block importance models for web pages
Proceedings of the 13th international conference on World Wide Web
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
Feature selection using linear classifier weights: interaction with classification models
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Detecting Phishing Web Pages with Visual Similarity Assessment Based on Earth Mover's Distance (EMD)
IEEE Transactions on Dependable and Secure Computing
A segmentation method for web page analysis using shrinking and dividing
International Journal of Parallel, Emergent and Distributed Systems - Network and parallel computing
Document structure meets page layout: loopy random fields for web news content extraction
Proceedings of the 10th ACM symposium on Document engineering
Vi-DIFF: understanding web pages changes
DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part I
Hi-index | 0.00 |
In this paper, we propose a Web page archiving system that combines state-of-the-art comparison methods based on the source codes of Web pages, with computer vision techniques. To detect whether successive versions of a Web page are similar or not, our system is based on: (1) a combination of structural and visual comparison methods embedded in a statistical discriminative model, (2) a visual similarity measure designed for Web pages that improves change detection, (3) a supervised feature selection method adapted to Web archiving. We train a Support Vector Machine model with vectors of similarity scores between successive versions of pages. The trained model then determines whether two versions, defined by their vector of similarity scores, are similar or not. Experiments on real archives validate our approach.