Can we analyze big data inside a DBMS?

  • Authors:
  • Carlos Ordonez

  • Affiliations:
  • University of Houston, Houston, TX, USA

  • Venue:
  • Proceedings of the sixteenth international workshop on Data warehousing and OLAP
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Relational DBMSs remain the main data management technology, despite the big data analytics and no-SQL waves. On the other hand, for data analytics in a broad sense, there are plenty of non-DBMS tools including statistical languages, matrix packages, generic data mining programs and large-scale parallel systems, being the main technology for big data analytics. Such large-scale systems are mostly based on the Hadoop distributed file system and MapReduce. Thus it would seem a DBMS is not a good technology to analyze big data, going beyond SQL queries, acting just as a reliable and fast data repository. In this survey, we argue that is not the case, explaining important research that has enabled analytics on large databases inside a DBMS. However, we also argue DBMSs cannot compete with parallel systems like MapReduce to analyze web-scale text data. Therefore, each technology will keep influencing each other. We conclude with a proposal of long-term research issues, considering the "big data analytics" trend.