Future Generation Computer Systems - Special issue on metacomputing
Predicting Sporadic Grid Data Transfers
HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
A measurement study of available bandwidth estimation tools
Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
Web100: extended TCP instrumentation for research, education and diagnosis
ACM SIGCOMM Computer Communication Review
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Using overlays for efficient data transfer over shared wide-area networks
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Which network measurement tool is right for you? a multidimensional comparison study
GRID '08 Proceedings of the 2008 9th IEEE/ACM International Conference on Grid Computing
Communications of the ACM
A data transfer framework for large-scale science experiments
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
A Data Throughput Prediction and Optimization Service for Widely Distributed Many-Task Computing
IEEE Transactions on Parallel and Distributed Systems
Budget-constrained bulk data transfer via internet and shipping networks
Proceedings of the 8th ACM international conference on Autonomic computing
Passive Network Performance Estimation for Large-Scale, Data-Intensive Computing
IEEE Transactions on Parallel and Distributed Systems
Inter-datacenter bulk transfers with netstitcher
Proceedings of the ACM SIGCOMM 2011 conference
Prediction of Optimal Parallelism Level in Wide Area Data Transfers
IEEE Transactions on Parallel and Distributed Systems
Software as a service for data scientists
Communications of the ACM
Network-aware end-to-end data throughput optimization
Proceedings of the first international workshop on Network-aware data management
Evaluation and characterization of available bandwidth probing techniques
IEEE Journal on Selected Areas in Communications
SCC '12 Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis
Hi-index | 0.00 |
As big-data processing and analysis dominates the usage of the Cloud systems, the need for Cloud-hosted data scheduling and optimization services increases. One key component for such a service is to provide available bandwidth and achievable throughput estimation capabilities, since all scheduling and optimization decisions would be built on top of this information. The biggest challenge in providing these estimation capabilities is the dynamic decision of what proportion of the actual dataset, when transferred, would give us an accurate estimate of the bandwidth and throughput achieved by transferring the whole data set. That proportion of data is called the sampling size (or the probe size). Although small fixed sample sizes worked well for high-latency low-bandwidth networks in the past, high-bandwidth networks require much larger and more dynamic sample sizes, since an accurate estimation now also depends on how fast the transfer protocol can saturate that fat network link. In this study, we present a model to decide the optimal sampling size based on the data size and estimated capacity of the network. Our results show that the predicted sampling size is very accurate compared to the targeted best sampling size for a certain file transfer in a majority of the cases.