Inside "Big Data management": ogres, onions, or parfaits?

Authors:
Vinayak Borkar;Michael J. Carey;Chen Li
Affiliations:
UC Irvine;UC Irvine;UC Irvine
Venue:
Proceedings of the 15th International Conference on Extending Database Technology
Year:
2012

Citing 32
Cited 6

Performance analysis of several back-end database architectures

ACM Transactions on Database Systems (TODS)
Parallel database systems: the future of high performance database systems

Communications of the ACM
DB2 parallel edition

IBM Systems Journal
The log-structured merge-tree (LSM-tree)

Acta Informatica
The object database standard: ODMG 2.0

The object database standard: ODMG 2.0
Operating system support for database management

Communications of the ACM
Database Management Systems

Database Management Systems
Volcano— An Extensible and Parallel Query Evaluation System

IEEE Transactions on Knowledge and Data Engineering
Tandem Database Group - NonStop SQL: A Distributed, High-Performance, High-Availability Implementation of SQL

Proceedings of the 2nd International Workshop on High Performance Transaction Systems
An Overview of The System Software of A Parallel Relational Database Machine GRACE

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
GAMMA - A High Performance Dataflow Database Machine

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
"One Size Fits All": An Idea Whose Time Has Come and Gone

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Dynamo: amazon's highly available key-value store

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
The Genesis of a Database Computer

Computer
Bigtable: A Distributed Storage System for Structured Data

ACM Transactions on Computer Systems (TOCS)
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
SCOPE: easy and efficient parallel processing of massive data sets

Proceedings of the VLDB Endowment
A comparison of approaches to large-scale data analysis

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
MapReduce and parallel DBMSs: friends or foes?

Communications of the ACM - Amir Pnueli: Ahead of His Time
MapReduce: a flexible data processing tool

Communications of the ACM - Amir Pnueli: Ahead of His Time
Building a high-level dataflow system on top of Map-Reduce: the Pig experience

Proceedings of the VLDB Endowment
Nephele/PACTs: a programming model and execution framework for web-scale analytical processing

Proceedings of the 1st ACM symposium on Cloud computing
Pregel: a system for large-scale graph processing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Efficient parallel set-similarity joins using MapReduce

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A comparison of join algorithms for log processing in MaPreduce

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Scalable SQL and NoSQL data stores

ACM SIGMOD Record
Hyracks: A flexible and extensible foundation for data-intensive computing

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Efficient processing of set-similarity joins on large clusters

Efficient processing of set-similarity joins on large clusters

Big data platforms: What's next?

XRDS: Crossroads, The ACM Magazine for Students - Big Data
Predictive analytics with surveillance big data

Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data
Capturing and querying workflow runtime provenance with PROV: a practical approach

Proceedings of the Joint EDBT/ICDT 2013 Workshops
The family of mapreduce and large-scale data processing systems

ACM Computing Surveys (CSUR)
Big data: a research agenda

Proceedings of the 17th International Database Engineering & Applications Symposium
Making queries tractable on big data with preprocessing: through the eyes of complexity theory

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we review the history of systems for managing "Big Data" as well as today's activities and architectures from the (perhaps biased) perspective of three "database guys" who have been watching this space for a number of years and are currently working together on "Big Data" problems. Our focus is on architectural issues, and particularly on the components and layers that have been developed recently (in open source and elsewhere) and on how they are being used (or abused) to tackle challenges posed by today's notion of "Big Data". Also covered is the approach we are taking in the ASTERIX project at UC Irvine, where we are developing our own set of answers to the questions of the "right" components and the "right" set of layers for taming the "Big Data" beast. We close by sharing our opinions on what some of the important open questions are in this area as well as our thoughts on how the dataintensive computing community might best seek out answers.