Parallel accessing massive NetCDF data based on mapreduce

  • Authors:
  • Hui Zhao;SiYun Ai;ZhenHua Lv;Bo Li

  • Affiliations:
  • Key Laboratory of Trustworthy Computing of Shanghai, China and Institute of Software Engineering, East China Normal University Shanghai, China;School of EEE Communication Software & Network, Nanyang Technology University Singapore;Key Laboratory of Geographic Information Science, Ministry of Education, Geography Department, East China Normal University, Shanghai, China;Key Laboratory of Trustworthy Computing of Shanghai, China

  • Venue:
  • WISM'10 Proceedings of the 2010 international conference on Web information systems and mining
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

As a Network Common Data Format, NetCDF has been widely used in terrestrial, marine and atmospheric sciences. A new paralleling storage and access method for large scale NetCDF scientific data is implemented based on Hadoop. The retrieval method is implemented based on MapReduce. The Argo data is used to demonstrate our method. The performance is compared under a distributed environment based on PCs by using different data scale and different task numbers. The experiments result show that the parallel method can be used to store and access the large scale NetCDF efficiently.