Efficient Replicated Remote File Comparison
IEEE Transactions on Computers
Interactive communication of balanced distributions and of correlated files
SIAM Journal on Discrete Mathematics
Communication complexity
Delta algorithms: an empirical analysis
ACM Transactions on Software Engineering and Methodology (TOSEM)
MobiCom '98 Proceedings of the 4th annual ACM/IEEE international conference on Mobile computing and networking
A probabilistic algorithm for updating files over a communication link
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Synchronizing a database to improve freshness
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Communication complexity of document exchange
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
WebBase: a repository of Web pages
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
A protocol-independent technique for eliminating redundant network traffic
Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication
An adaptive model for optimizing performance of an incremental web crawler
Proceedings of the 10th international conference on World Wide Web
A low-bandwidth network file system
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
An algebraic approach to file synchronization
Proceedings of the 8th European software engineering conference held jointly with 9th ACM SIGSOFT international symposium on Foundations of software engineering
Searching games with errors---fifty years of coping with liars
Theoretical Computer Science
Efficient Location of Discrepancies in Multiple Replicated Large Files
IEEE Transactions on Parallel and Distributed Systems
Keeping Up with the Changing Web
Computer
A Class of Randomized Strategies for Low-Cost Comparison of File Copies
IEEE Transactions on Parallel and Distributed Systems
An Optimal Strategy for Comparing File Copies
IEEE Transactions on Parallel and Distributed Systems
The Evolution of the Web and Implications for an Incremental Crawler
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Engineering a Differencing and Compression Data Format
ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
IEEE Transactions on Mobile Computing
WWW '03 Proceedings of the 12th international conference on World Wide Web
Efficient randomized pattern-matching algorithms
IBM Journal of Research and Development - Mathematics and computing
WebBase: Building a Web Warehouse
ENC '04 Proceedings of the Fifth Mexican International Conference in Computer Science
Pastiche: making backup cheap and easy
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
In-place rsync: file synchronization for mobile and wireless devices
ATEC '03 Proceedings of the annual conference on USENIX Annual Technical Conference
Rate of change and other metrics: a live study of the world wide web
USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
Effective change detection using sampling
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A Parity Structure for Large Remotely Located Replicated Data Files
IEEE Transactions on Computers
On the scalability of data synchronization protocols for PDAs and mobile devices
IEEE Network: The Magazine of Global Internetworking
Automatic detection of fragments in dynamically generated web pages
Proceedings of the 13th international conference on World Wide Web
Hierarchical substring caching for efficient content distribution to low-bandwidth clients
WWW '05 Proceedings of the 14th international conference on World Wide Web
Server-friendly delta compression for efficient web access
Web content caching and distribution
Redundancy elimination within large collections of files
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
AIKED'08 Proceedings of the 7th WSEAS International Conference on Artificial intelligence, knowledge engineering and data bases
An adaptive wrapper algorithm for file transfer applications to support optimal large file transfers
ICACT'09 Proceedings of the 11th international conference on Advanced Communication Technology - Volume 1
On compressing the textual web
Proceedings of the third ACM international conference on Web search and data mining
Bimodal content defined chunking for backup streams
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Enhancing redundant network traffic elimination
Computer Networks: The International Journal of Computer and Telecommunications Networking
WAN optimized replication of backup datasets using stream-informed delta compression
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
On the impact of virtualization on Dropbox-like cloud file storage/synchronization services
Proceedings of the 2012 IEEE 20th International Workshop on Quality of Service
WAN-optimized replication of backup datasets using stream-informed delta compression
ACM Transactions on Storage (TOS)
Hi-index | 0.00 |
We study the problem of maintaining large replicated collectionsof files or documents in a distributed environment withlimited bandwidth. This problem arises in a number of importantapplications, such as synchronization of data betweenaccounts or devices, content distibution and web caching networks,web site mirroring, storage networks, and large scaleweb search and mining. At the core of the problem lies thefollowing challenge, called the file synchronization problem:given two versions of a file on different machines, say an outdatedand a current one, how can we update the outdatedversion with minimum communication cost, by exploiting thesignificant similarity between the versions? While a popularopen source tool for this problem called rsync is used in hundredsof thousands of installations, there have been only veryfew attempts to improve upon this tool in practice.In this paper, we propose a framework for remote file synchronizationand describe several new techniques that resultin significant bandwidth savings. Our focus is on applicationswhere very large collections have to be maintainedover slow connections. We show that a prototype implementationof our framework and techniques achieves significantimprovements over rsync. As an example application, we focuson the efficient synchronization of very large web pagecollections for the purpose of search, mining, and contentdistribution.