MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Hadoop: The Definitive Guide
The Hadoop Distributed File System
MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
Parallel data processing with MapReduce: a survey
ACM SIGMOD Record
Parallel Simulations for Analysing Portfolios of Catastrophic Event Risk
SCC '12 Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis
Hi-index | 0.00 |
Monte Carlo simulations employed for the analysis of portfolios of catastrophic risk process large volumes of data. Often times these simulations are not performed in real-time scenarios as they are slow and consume large data. Such simulations can benefit from a framework that exploits parallelism for addressing the computational challenge and facilitates a distributed file system for addressing the data challenge. To this end, the Apache Hadoop framework is chosen for the simulation reported in this paper so that the computational challenge can be tackled using the MapReduce model and the data challenge can be addressed using the Hadoop Distributed File System. A parallel algorithm for the analysis of aggregate risk is proposed and implemented using the MapReduce model in this paper. An evaluation of the performance of the algorithm indicates that the Hadoop MapReduce model offers a framework for processing large data in aggregate risk analysis. A simulation of aggregate risk employing 100,000 trials with 1000 catastrophic events per trial on a typical exposure set and contract structure is performed on multiple worker nodes in less than 6 minutes. The result indicates the scope and feasibility of MapReduce for tackling the computational and data challenge in the analysis of aggregate risk for real-time use.