Twister: a runtime for iterative MapReduce
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Large scale data analytics on clouds
Proceedings of the fourth international workshop on Cloud data management
Hi-index | 0.00 |
The cyberinfrastructure supporting science appears will include large-scale simulation systems headed to exascale combined with cloud like systems supporting data intensive and high throughput computing, pleasingly parallel jobs and the long tail of science. Clouds offer economies of scale, elasticity supporting real time and interactive use and powerful new programming models such as MapReduce. We stress that iterative extensions of MapReduce will be necessary to get good performance on for several data mining (analytics) applications. We give several illustrations mainly from bioinformatics. We suggest that the data deluge implies a corresponding increase in the computational resources needed to support analysis and this suggests new architectures for large scale data repositories.