Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
Principled design of the modern Web architecture
ACM Transactions on Internet Technology (TOIT)
Distributed caching with memcached
Linux Journal
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Experimental Analysis of InfiniBand Transport Services on WAN
NAS '08 Proceedings of the 2008 International Conference on Networking, Architecture, and Storage
PNUTS: Yahoo!'s hosted data serving platform
Proceedings of the VLDB Endowment
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Hadoop high availability through metadata replication
Proceedings of the first international workshop on Cloud data management
Benchmarking cloud serving systems with YCSB
Proceedings of the 1st ACM symposium on Cloud computing
Providing a cloud network infrastructure on a supercomputer
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
RDMA-Based Job Migration Framework for MPI over InfiniBand
CLUSTER '10 Proceedings of the 2010 IEEE International Conference on Cluster Computing
Unifying UPC and MPI runtimes: experience with MVAPICH
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
Hadoop acceleration through network levitated merge
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
On the duality of data-intensive file system design: reconciling HDFS and PVFS
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Memcached Design on High Performance RDMA Capable Interconnects
ICPP '11 Proceedings of the 2011 International Conference on Parallel Processing
Using the Gfarm File System as a POSIX Compatible Storage Platform for Hadoop MapReduce Applications
GRID '11 Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing
High-Performance Design of HBase with RDMA over InfiniBand
IPDPS '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium
Peak performance: remote memory revisited
Proceedings of the Ninth International Workshop on Data Management on New Hardware
Does RDMA-based enhanced Hadoop MapReduce need a new performance model?
Proceedings of the 4th annual Symposium on Cloud Computing
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
Hadoop Distributed File System (HDFS) acts as the primary storage of Hadoop and has been adopted by reputed organizations (Facebook, Yahoo! etc.) due to its portability and fault-tolerance. The existing implementation of HDFS uses Java-socket interface for communication which delivers suboptimal performance in terms of latency and throughput. For data-intensive applications, network performance becomes key component as the amount of data being stored and replicated to HDFS increases. In this paper, we present a novel design of HDFS using Remote Direct Memory Access (RDMA) over InfiniBand via JNI interfaces. Experimental results show that, for 5GB HDFS file writes, the new design reduces the communication time by 87% and 30% over 1Gigabit Ethernet (1GigE) and IP-over-InfiniBand (IPoIB), respectively, on QDR platform (32Gbps). For HBase, the Put operation performance is improved by 26% with our design. To the best of our knowledge, this is the first design of HDFS over InfiniBand networks.