Scalable queries for large datasets using cloud computing: a case study

  • Authors:
  • James P. McGlothlin;Latifur Khan

  • Affiliations:
  • The University of Texas at Dallas, Richardson, TX;The University of Texas at Dallas, Richardson, TX

  • Venue:
  • Proceedings of the 15th Symposium on International Database Engineering & Applications
  • Year:
  • 2011

Quantified Score

Hi-index 0.01

Visualization

Abstract

Cloud computing is rapidly growing in popularity as a solution for processing and retrieving huge amounts of data over clusters of inexpensive commodity hardware. The most common data model utilized by cloud computing software is the NoSQL data model. While this data model is extremely scalable, it is much more efficient for simple retrievals and scans than for the complex analytical queries typical in a relational database model. In this paper, we evaluate emerging cloud computing technologies using a representative use case. Our use case involves analyzing telecommunications logs for performance monitoring and quality assurance. Clearly, the size of such logs is growing exponentially as more devices communicate more frequently and the amount of data being transferred steadily increases. We analyze potential solutions to provide a scalable database which supports both retrieval and analysis. We will investigate and analyze all the major open source cloud computing solutions and designs. We then choose the most applicable subset of these technologies for experimentation. We provide a performance evaluation of these products, and we analyze our results and make recommendations. This paper provides a comprehensive survey of technologies for scalable data processing and an in-depth performance evaluation of these technologies.