Large data and computation in a hazard map workflow using Hadoop and Neteeza architectures

  • Authors:
  • Shivaswamy Rohit;Abani K. Patra;Vipin Chaudhary

  • Affiliations:
  • University at Buffalo, New York;University at Buffalo, New York;University at Buffalo, New York

  • Venue:
  • DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Uncertainty Quantification(UQ) using simulation ensembles leads to twin challenges of managing large amount of data and performing cpu intensive computing. While algorithmic innovations using surrogates, localization and parallelization can make the problem feasible one still has very large data and compute tasks. Such integration of large data analytics and computationally expensive tasks is increasingly common. We present here an approach to solving this problem by using a mix of hardware and a workflow that maps tasks to appropriate hardware. We experiment with two computing environments -- the first is an integration of a Netezza data warehouse appliance and a high performance cluster and the second a hadoop based environment. Our approach is based on segregating the data intensive and compute intensive tasks and assigning the right architecture to each. We present here the computing models and the new schemes in the context of generating probabilistic hazard maps using ensemble runs of the volcanic debris avalanche simulator TITAN2D and UQ methodology.