Experimenting lucene index on HBase in an HPC environment
Proceedings of the first annual workshop on High performance computing meets databases
Living in the present: on-the-fly information processing in scalable web architectures
Proceedings of the 2nd International Workshop on Cloud Computing Platforms
Energy efficiency for large-scale MapReduce workloads with significant interactive analysis
Proceedings of the 7th ACM european conference on Computer Systems
Jockey: guaranteed job latency in data parallel clusters
Proceedings of the 7th ACM european conference on Computer Systems
Performance engineering for cloud computing
EPEW'11 Proceedings of the 8th European conference on Computer Performance Engineering
bLSM: a general purpose log structured merge tree
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
NaaS: network-as-a-service in the cloud
Hot-ICE'12 Proceedings of the 2nd USENIX conference on Hot Topics in Management of Internet, Cloud, and Enterprise Networks and Services
Camdoop: exploiting in-network aggregation for big data applications
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Understanding the effects and implications of compute node related failures in hadoop
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
A highly efficient cloud-based architecture for large-scale STB event processing: industry article
Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems
Sweet storage SLOs with Frosting
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Efficient multi-way theta-join processing using MapReduce
Proceedings of the VLDB Endowment
M3R: increased performance for in-memory Hadoop jobs
Proceedings of the VLDB Endowment
The unified logging infrastructure for data analytics at Twitter
Proceedings of the VLDB Endowment
Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads
Proceedings of the VLDB Endowment
Cake: enabling high-level SLOs on shared storage systems
Proceedings of the Third ACM Symposium on Cloud Computing
A Distributed Cache for Hadoop Distributed File System in Real-Time Cloud Services
GRID '12 Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing
Improving Bandwidth Efficiency for Consistent Multistream Storage
ACM Transactions on Storage (TOS)
Pollux: towards scalable distributed real-time search on microblogs
Proceedings of the 16th International Conference on Extending Database Technology
CamCubeOS: a key-based network stack for 3D torus cluster topologies
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
The big data ecosystem at LinkedIn
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Execution and optimization of continuous queries with cyclops
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Fast data in the era of big data: Twitter's real-time related query suggestion architecture
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
LinkBench: a database benchmark based on the Facebook social graph
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
HyMR: a hybrid MapReduce workflow system
Proceedings of the 3rd international workshop on Emerging computational methods for the life sciences
Leveraging endpoint flexibility in data-intensive clusters
Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM
Data-Intensive Cloud Computing: Requirements, Expectations, Challenges, and Solutions
Journal of Grid Computing
jVerbs: ultra-low latency for data center applications
Proceedings of the 4th annual Symposium on Cloud Computing
Representing mapreduce optimisations in the nested relational calculus
BNCOD'13 Proceedings of the 29th British National conference on Big Data
Copysets: reducing the frequency of data loss in cloud storage
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Analysis of HDFS under HBase: a facebook messages case study
FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies
Optimizing I/O forwarding techniques for extreme-scale event tracing
Cluster Computing
Hi-index | 0.00 |
Facebook recently deployed Facebook Messages, its first ever user-facing application built on the Apache Hadoop platform. Apache HBase is a database-like layer built on Hadoop designed to support billions of messages per day. This paper describes the reasons why Facebook chose Hadoop and HBase over other systems such as Apache Cassandra and Voldemort and discusses the application's requirements for consistency, availability, partition tolerance, data model and scalability. We explore the enhancements made to Hadoop to make it a more effective realtime system, the tradeoffs we made while configuring the system, and how this solution has significant advantages over the sharded MySQL database scheme used in other applications at Facebook and many other web-scale companies. We discuss the motivations behind our design choices, the challenges that we face in day-to-day operations, and future capabilities and improvements still under development. We offer these observations on the deployment as a model for other companies who are contemplating a Hadoop-based solution over traditional sharded RDBMS deployments.