Brown Dwarf: A fully-distributed, fault-tolerant data warehousing system

Authors:
Katerina Doka;Dimitrios Tsoumakos;Nectarios Koziris
Affiliations:
-;-;-
Venue:
Journal of Parallel and Distributed Computing
Year:
2011

Citing 23
Cited 0

An adaptive peer-to-peer network for distributed caching of OLAP results

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Dwarf: shrinking the PetaCube

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

Data Mining and Knowledge Discovery
Efficient OLAP query processing in distributed data warehouses

Information Systems - Special issue: Best papers from EDBT 2002
A Distributed OLAP Infrastructure for E-Commerce

COOPIS '99 Proceedings of the Fourth IECIS International Conference on Cooperative Information Systems
The DC-Tree: A Fully Dynamic Index Structure for Data Warehouses

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Peer-to-peer information retrieval using self-organizing semantic overlay networks

Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
QC-trees: an efficient summary structure for semantic OLAP

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Condensed Cube: An Efficient Approach to Reducing Data Cube Size

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Querying the internet with PIER

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Enhancing P2P file-sharing with an internet-scale query processor

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
SCOPE: easy and efficient parallel processing of massive data sets

Proceedings of the VLDB Endowment
WebContent: efficient P2P Warehousing of web data

Proceedings of the VLDB Endowment
Dwarfs in the rearview mirror: how big are they really?

Proceedings of the VLDB Endowment
GrouPeer: Dynamic clustering of P2P databases

Information Systems
P2P OLAP: Data model, implementation and case study

Information Systems
A comparison of approaches to large-scale data analysis

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Hive: a warehousing solution over a map-reduce framework

Proceedings of the VLDB Endowment
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads

Proceedings of the VLDB Endowment
Dremel: interactive analysis of web-scale datasets

Proceedings of the VLDB Endowment
Online querying of d-dimensional hierarchies

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present the Brown Dwarf, a distributed data analytics system designed to efficiently store, query and update multidimensional data over commodity network nodes, without the use of any proprietary tool. Brown Dwarf distributes a centralized indexing structure among peers on-the-fly, reducing cube creation and querying times by enforcing parallelization. Analytical queries are naturally performed on-line through cooperating nodes that form an unstructured Peer-to-Peer overlay. Updates are also performed on-line, eliminating the usually costly over-night process. Moreover, the system employs an adaptive replication scheme that adjusts to the workload skew as well as the network churn by expanding or shrinking the units of the distributed data structure. Our system has been thoroughly evaluated on an actual testbed: it manages to accelerate cube creation up and querying up to several tens of times compared to the centralized solution by exploiting the capabilities of the available network nodes working in parallel. It also manages to quickly adapt even after sudden bursts in load and remains unaffected with a considerable fraction of frequent node failures. These advantages are even more apparent for dense and skewed data cubes and workloads.