A novel approach to data deduplication over the engineering-oriented cloud systems

  • Authors:
  • Zhe Sun;Jun Shen;Jianming Yong

  • Affiliations:
  • School of Information Systems and Technology, University of Wollongong, Wollongong, NSW, Australia and Information Management Center, Huaneng Shandong Shidao Bay Nuclear Power Company, Ltd, Longch ...;School of Information Systems and Technology, University of Wollongong, Wollongong, NSW, Australia;School of Information Systems, University of Southern Queensland, Toowoomba, QLD, Australia

  • Venue:
  • Integrated Computer-Aided Engineering
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a duplication-less storage system over the engineering-oriented cloud computing platforms. Our deduplication storage system, which manages data and duplication over the cloud system, consists of two major components, a front-end deduplication application and a mass storage system as back-end. Hadoop distributed file system HDFS is a common distribution file system on the cloud, which is used with Hadoop database HBase. We use HDFS to build up a mass storage system and employ HBase to build up a fast indexing system. With a deduplication application, a scalable and parallel deduplicated cloud storage system can be effectively built up. We further use VMware to generate a simulated cloud environment. The simulation results demonstrate that our deduplication storage system is sufficiently accurate and efficient for distributed and cooperative data intensive engineering applications.