Distributed and parallel database systems
ACM Computing Surveys (CSUR)
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
A comparison of approaches to large-scale data analysis
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
MapReduce and parallel DBMSs: friends or foes?
Communications of the ACM - Amir Pnueli: Ahead of His Time
MapReduce: a flexible data processing tool
Communications of the ACM - Amir Pnueli: Ahead of His Time
Hadoop: The Definitive Guide
If You Have Too Much Data, then “Good Enough” Is Good Enough
Queue - Programming Languages
How Will Astronomy Archives Survive the Data Tsunami?
Queue - Programming Languages
Hi-index | 0.00 |
We live in the data age as data storage technologies, hardware and software, have evolved to a point at which it is very cheap to store large volumes of data, structured and unstructured. The increased popularity of social media has contributed to the accumulation of large data volumes, mostly unstructured, which analyzed could yield valuable insight. Extracting meaningful, useful and accurate information in a timely manner from very large data sets is a complex task that requires a careful selection of the right hardware software and data model. This paper analyzes the problem of storing, processing and retrieving meaningful insight from petabytes of data. It provides a survey on current distributed and parallel data processing technologies and, based on them, will propose an architecture that can be used to solve the analyzed problem.