High performance risk aggregation: addressing the data processing challenge the hadoop mapreduce way

Authors:
Zhimin Yao;Blesson Varghese;Andrew Rau-Chaplin
Affiliations:
Dalhousie University, Halifax, NS, Canada;Dalhousie University, Halifax, NS, Canada;Dalhousie University, Halifax, NS, Canada
Venue:
Proceedings of the 4th ACM workshop on Scientific cloud computing
Year:
2013

Citing 5
Cited 0

MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Hadoop: The Definitive Guide

Hadoop: The Definitive Guide
The Hadoop Distributed File System

MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
Parallel data processing with MapReduce: a survey

ACM SIGMOD Record
Parallel Simulations for Analysing Portfolios of Catastrophic Event Risk

SCC '12 Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Monte Carlo simulations employed for the analysis of portfolios of catastrophic risk process large volumes of data. Often times these simulations are not performed in real-time scenarios as they are slow and consume large data. Such simulations can benefit from a framework that exploits parallelism for addressing the computational challenge and facilitates a distributed file system for addressing the data challenge. To this end, the Apache Hadoop framework is chosen for the simulation reported in this paper so that the computational challenge can be tackled using the MapReduce model and the data challenge can be addressed using the Hadoop Distributed File System. A parallel algorithm for the analysis of aggregate risk is proposed and implemented using the MapReduce model in this paper. An evaluation of the performance of the algorithm indicates that the Hadoop MapReduce model offers a framework for processing large data in aggregate risk analysis. A simulation of aggregate risk employing 100,000 trials with 1000 catastrophic events per trial on a typical exposure set and contract structure is performed on multiple worker nodes in less than 6 minutes. The result indicates the scope and feasibility of MapReduce for tackling the computational and data challenge in the analysis of aggregate risk for real-time use.