Principles of distributed data management in 2020?

Authors:
Patrick Valduriez
Affiliations:
INRIA and LIRMM, Montpellier, France
Venue:
DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
Year:
2011

Citing 11
Cited 1

Parallel database systems: open problems and new issues

Distributed and Parallel Databases - Special issue: Research topics in distributed and parallel databases
Scaling Access to Heterogeneous Data Sources with DISCO

IEEE Transactions on Knowledge and Data Engineering
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Bigtable: A Distributed Storage System for Structured Data

ACM Transactions on Computer Systems (TOCS)
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
PNUTS: Yahoo!'s hosted data serving platform

Proceedings of the VLDB Endowment
Managing scientific data

Communications of the ACM
Principles of Distributed Database Systems

Principles of Distributed Database Systems
A rule-based language for web data management

Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Distributed data management in 2020?

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Data management in large-scale p2p systems

VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science

Update Propagator for Joint Scalable Storage

Fundamenta Informaticae - Concurrency Specification and Programming (CS&P)

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the advents of high-speed networks, fast commodity hardware, and the web, distributed data sources have become ubiquitous. The third edition of the Özsu-Valduriez textbook Principles of Distributed Database Systems [10] reflects the evolution of distributed data management and distributed database systems. In this new edition, the fundamental principles of distributed data management could be still presented based on the three dimensions of earlier editions: distribution, heterogeneity and autonomy of the data sources. In retrospect, the focus on fundamental principles and generic techniques has been useful not only to understand and teach the material, but also to enable an infinite number of variations. The primary application of these generic techniques has been obviously for distributed and parallel DBMS versions. Today, to support the requirements of important data-intensive applications (e.g. social networks, web data analytics, scientific applications, etc.), new distributed data management techniques and systems (e.g. MapReduce, Hadoop, SciDB, Peanut, Pig latin, etc.) are emerging and receiving much attention from the research community. Although they do well in terms of consistency/flexibility/performance trade-offs for specific applications, they seem to be ad-hoc and might hurt data interoperability. The key questions I discuss are: What are the fundamental principles behind the emerging solutions? Is there any generic architectural model, to explain those principles? Do we need new foundations to look at data distribution?