Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
An adaptive model for optimizing performance of an incremental web crawler
Proceedings of the 10th international conference on World Wide Web
Optimal crawling strategies for web search engines
Proceedings of the 11th international conference on World Wide Web
Keeping Up with the Changing Web
Computer
The Evolution of the Web and Implications for an Incremental Crawler
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
A First Experience in Archiving the French Web
ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
Web usage mining: discovery and applications of usage patterns from Web data
ACM SIGKDD Explorations Newsletter
Estimating frequency of change
ACM Transactions on Internet Technology (TOIT)
Effective page refresh policies for Web crawlers
ACM Transactions on Database Systems (TODS)
Learning block importance models for web pages
Proceedings of the 13th international conference on World Wide Web
Information diffusion through blogspace
Proceedings of the 13th international conference on World Wide Web
Scheduling Algorithms for Web Crawling
LA-WEBMEDIA '04 Proceedings of the WebMedia & LA-Web 2004 Joint Conference 10th Brazilian Symposium on Multimedia and the Web 2nd Latin American Web Congress
WWW '05 Proceedings of the 14th international conference on World Wide Web
C-Miner: Mining Block Correlations in Storage Systems
FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
Managing duplicates in a web archive
Proceedings of the 2006 ACM symposium on Applied computing
Web Archiving
CP-Miner: a tool for finding copy-paste and related bugs in operating system code
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Efficient Monitoring Algorithm for Fast News Alerts
IEEE Transactions on Knowledge and Data Engineering
Frequent pattern mining: current status and future directions
Data Mining and Knowledge Discovery
Efficient mining of XML query patterns for caching
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Recrawl scheduling based on information longevity
Proceedings of the 17th international conference on World Wide Web
The web changes everything: understanding the dynamics of web content
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Exploiting idle CPU cores to improve file access performance
Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication
Resonance on the web: web dynamics and revisitation patterns
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Proceedings of the 3rd workshop on Information credibility on the web
SHARC: framework for quality-conscious web archiving
Proceedings of the VLDB Endowment
Using visual pages analysis for optimizing web archiving
Proceedings of the 2010 EDBT/ICDT Workshops
Vi-DIFF: understanding web pages changes
DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part I
Improving the quality of web archives through the importance of changes
DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
Coherence-oriented crawling and navigation using patterns for web archives
TPDL'11 Proceedings of the 15th international conference on Theory and practice of digital libraries: research and advanced technology for digital libraries
A quantitative evaluation of techniques for detection of abnormal change events in blogs.
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Reading the correct history?: modeling temporal intention in resource sharing
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Archival HTTP redirection retrieval policies
Proceedings of the 22nd international conference on World Wide Web companion
Archiving the relaxed consistency web
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
A pattern is a model or a template used to summarize and describe the behavior (or the trend) of a data having generally some recurrent events. Patterns have received a considerable attention in recent years and were widely studied in the data mining field. Various pattern mining approaches have been proposed and used for different applications such as network monitoring, moving object tracking, financial or medical data analysis, scientific data processing, etc. In these different contexts, discovered patterns were useful to detect anomalies, to predict data behavior (or trend), or more generally, to simplify data processing or to improve system performance. However, to the best of our knowledge, patterns have never been used in the context of web archiving. Web archiving is the process of continuously collecting and preserving portions of the World Wide Web for future generations. In this paper, we show how patterns of page changes can be useful tools to efficiently archive web sites. We first define our pattern model that describes the changes of pages. Then, we present the strategy used to (i) extract the temporal evolution of page changes, to (ii) discover patterns and to (iii) exploit them to improve web archives. We choose the archive of French public TV channels « France Télévisions » as a case study in order to validate our approach. Our experimental evaluation based on real web pages shows the utility of patterns to improve archive quality and to optimize indexing or storing.