Distributed parallel architecture for storing and processing large datasets

  • Authors:
  • Catalin Boja;Adrian Pocovnicu

  • Affiliations:
  • Department of Economic Informatics and Cybernetics, Bucharest Academy of Economic Studies, Bucharest, Romania;Department of Economic Informatics and Cybernetics, Bucharest Academy of Economic Studies, Bucharest, Romania

  • Venue:
  • SEPADS'12/EDUCATION'12 Proceedings of the 11th WSEAS international conference on Software Engineering, Parallel and Distributed Systems, and proceedings of the 9th WSEAS international conference on Engineering Education
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We live in the data age as data storage technologies, hardware and software, have evolved to a point at which it is very cheap to store large volumes of data, structured and unstructured. The increased popularity of social media has contributed to the accumulation of large data volumes, mostly unstructured, which analyzed could yield valuable insight. Extracting meaningful, useful and accurate information in a timely manner from very large data sets is a complex task that requires a careful selection of the right hardware software and data model. This paper analyzes the problem of storing, processing and retrieving meaningful insight from petabytes of data. It provides a survey on current distributed and parallel data processing technologies and, based on them, will propose an architecture that can be used to solve the analyzed problem.