Towards robust distributed systems (abstract)
Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing
Don't Be Lazy, Be Consistent: Postgres-R, A New Way to Implement Database Replication
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Effective page refresh policies for Web crawlers
ACM Transactions on Database Systems (TODS)
Ganymed: scalable replication for transactional web applications
Proceedings of the 5th ACM/IFIP/USENIX international conference on Middleware
Database Replication Using Generalized Snapshot Isolation
SRDS '05 Proceedings of the 24th IEEE Symposium on Reliable Distributed Systems
Dynamo: amazon's highly available key-value store
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Recrawl scheduling based on information longevity
Proceedings of the 17th international conference on World Wide Web
Communications of the ACM - Rural engineering development
Scalable query result caching for web applications
Proceedings of the VLDB Endowment
PNUTS: Yahoo!'s hosted data serving platform
Proceedings of the VLDB Endowment
H-store: a high-performance, distributed main memory transaction processing system
Proceedings of the VLDB Endowment
SHARC: framework for quality-conscious web archiving
Proceedings of the VLDB Endowment
What is Twitter, a social network or a news media?
Proceedings of the 19th international conference on World wide web
Benchmarking cloud serving systems with YCSB
Proceedings of the 1st ACM symposium on Cloud computing
Feeding frenzy: selectively materializing users' event feeds
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
The case for determinism in database systems
Proceedings of the VLDB Endowment
Using Paxos to build a scalable, consistent, and highly available datastore
Proceedings of the VLDB Endowment
Web Archiving
Feed following: the big data challenge in social applications
Databases and Social Networks
Archiving the web using page changes patterns: a case study
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
A survey on web archiving initiatives
TPDL'11 Proceedings of the 15th international conference on Theory and practice of digital libraries: research and advanced technology for digital libraries
Don't settle for eventual: scalable causal consistency for wide-area storage with COPS
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Eventual consistency: How soon is eventual? An evaluation of Amazon S3's consistency behavior
Proceedings of the 6th Workshop on Middleware for Service Oriented Computing
PNUTS in Flight: Web-Scale Data Serving at Yahoo
IEEE Internet Computing
Probabilistically bounded staleness for practical partial quorums
Proceedings of the VLDB Endowment
On the institutional archiving of social media
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Toward a principled framework for benchmarking consistency
HotDep'12 Proceedings of the Eighth USENIX conference on Hot Topics in System Dependability
Losing my revolution: how many resources shared on social media have been lost?
TPDL'12 Proceedings of the Second international conference on Theory and Practice of Digital Libraries
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Hi-index | 0.00 |
The historical, cultural, and intellectual importance of archiving the web has been widely recognized. Today, all countries with high Internet penetration rate have established high-profile archiving initiatives to crawl and archive the fast-disappearing web content for long-term use. As web technologies evolve, established web archiving techniques face challenges. This paper focuses on the potential impact of the relaxed consistency web design on crawler driven web archiving. Relaxed consistent websites may disseminate, albeit ephemerally, inaccurate and even contradictory information. If captured and preserved in the web archives as historical records, such information will degrade the overall archival quality. To assess the extent of such quality degradation, we build a simplified feed-following application and simulate its operation with synthetic workloads. The results indicate that a non-trivial portion of a relaxed consistency web archive may contain observable inconsistency, and the inconsistency window may extend significantly longer than that observed at the data store. We discuss the nature of such quality degradation and propose a few possible remedies.