Active disks: programming model, algorithms and evaluation
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
A performance analysis of the Berkeley UPC compiler
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Data Sieving and Collective I/O in ROMIO
FRONTIERS '99 Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation
Active Disk File System: A Distributed, Scalable File System
MSS '01 Proceedings of the Eighteenth IEEE Symposium on Mass Storage Systems and Technologies
An Architecture for Fast Processing of Large Unstructured Data Sets
ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Grid -Based Parallel Data Streaming implemented for the Gyrokinetic Toroidal Code
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Lerna: an active storage framework for flexible data access and management
HPDC '05 Proceedings of the High Performance Distributed Computing, 2005. HPDC-14. Proceedings. 14th IEEE International Symposium
Evaluation of active storage strategies for the lustre parallel file system
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Avoiding the disk bottleneck in the data domain deduplication file system
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Sparse indexing: large scale, inline deduplication using sampling and locality
FAST '09 Proccedings of the 7th conference on File and storage technologies
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Design and evaluation of distributed smart disk architecture for I/O-intensive workloads
ICCS'03 Proceedings of the 2003 international conference on Computational science
Enabling active storage on parallel I/O software stacks
MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
LACIO: A New Collective I/O Strategy for Parallel I/O Systems
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Enhancing I/O throughput via efficient routing and placement for large-scale parallel file systems
PCCC '11 Proceedings of the 30th IEEE International Performance Computing and Communications Conference
A Decoupled Execution Paradigm for Data-Intensive High-End Computing
CLUSTER '12 Proceedings of the 2012 IEEE International Conference on Cluster Computing
Hi-index | 0.00 |
Big Data computing provides a promising new opportunity for scientific discoveries and innovations. However, it also poses a significant challenge to the high-end computing community. An effective I/O solution is urgently required to support big data applications run on high-end computing systems. In this study, we propose a new approach namely DDiHA, Data Deduplication in Hybrid Architecture, to improve the write performance for write-intensive big data applications. The DDiHA approach utilizes data deduplications to reduce the size of data volumes before they are transfered and written to the storage. A hybrid architecture is introduced to facilitate data deduplications. Both theoretical study and prototyping verification were conducted to evaluate the DDiHA approach. The initial results have shown that, given the same compute resources, the DDiHA system outperformed the conventional architecture, even though it introduces additional computation workload from data deduplications. The DDiHA approach reduces the data size transferred across the network and improves the I/O system performance. It has a promising potential for write-intensive big data applications.