SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dynamo: amazon's highly available key-value store
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
The Chubby lock service for loosely-coupled distributed systems
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Bigtable: A Distributed Storage System for Structured Data
ACM Transactions on Computer Systems (TOCS)
A break in the clouds: towards a cloud definition
ACM SIGCOMM Computer Communication Review
Cassandra: structured storage system on a P2P network
Proceedings of the 28th ACM symposium on Principles of distributed computing
Hive: a warehousing solution over a map-reduce framework
Proceedings of the VLDB Endowment
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads
Proceedings of the VLDB Endowment
Communications of the ACM
Benchmarking cloud serving systems with YCSB
Proceedings of the 1st ACM symposium on Cloud computing
Heuristics-Based Query Processing for Large RDF Graphs Using Cloud Computing
IEEE Transactions on Knowledge and Data Engineering
Hi-index | 0.01 |
Cloud computing is rapidly growing in popularity as a solution for processing and retrieving huge amounts of data over clusters of inexpensive commodity hardware. The most common data model utilized by cloud computing software is the NoSQL data model. While this data model is extremely scalable, it is much more efficient for simple retrievals and scans than for the complex analytical queries typical in a relational database model. In this paper, we evaluate emerging cloud computing technologies using a representative use case. Our use case involves analyzing telecommunications logs for performance monitoring and quality assurance. Clearly, the size of such logs is growing exponentially as more devices communicate more frequently and the amount of data being transferred steadily increases. We analyze potential solutions to provide a scalable database which supports both retrieval and analysis. We will investigate and analyze all the major open source cloud computing solutions and designs. We then choose the most applicable subset of these technologies for experimentation. We provide a performance evaluation of these products, and we analyze our results and make recommendations. This paper provides a comprehensive survey of technologies for scalable data processing and an in-depth performance evaluation of these technologies.