Data intensive applications on clouds

Authors:
Geoffrey C. Fox
Affiliations:
Indiana University, Bloomington, IN, USA
Venue:
Proceedings of the second international workshop on Data intensive computing in the clouds
Year:
2011

Citing 1
Cited 1

Twister: a runtime for iterative MapReduce

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing

Large scale data analytics on clouds

Proceedings of the fourth international workshop on Cloud data management

Quantified Score

Hi-index	0.00

Visualization

Abstract

The cyberinfrastructure supporting science appears will include large-scale simulation systems headed to exascale combined with cloud like systems supporting data intensive and high throughput computing, pleasingly parallel jobs and the long tail of science. Clouds offer economies of scale, elasticity supporting real time and interactive use and powerful new programming models such as MapReduce. We stress that iterative extensions of MapReduce will be necessary to get good performance on for several data mining (analytics) applications. We give several illustrations mainly from bioinformatics. We suggest that the data deluge implies a corresponding increase in the computational resources needed to support analysis and this suggests new architectures for large scale data repositories.