Data placement in Bubba

Authors:
George Copeland;William Alexander;Ellen Boughter;Tom Keller
Affiliations:
MCC, 3500 West Balcones Center Drive, Austin, Texas;MCC, 3500 West Balcones Center Drive, Austin, Texas;MCC, 3500 West Balcones Center Drive, Austin, Texas;MCC, 3500 West Balcones Center Drive, Austin, Texas
Venue:
SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Year:
1988

Citing 22
Cited 109

Quantitative system performance: computer system analysis using queueing network models

Quantitative system performance: computer system analysis using queueing network models
Adaptive record clustering

ACM Transactions on Database Systems (TODS)
A measure of transaction processing power

Datamation
Multi-disk management algorithms

SIGMETRICS '87 Proceedings of the 1987 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A workload characterization pipeline for models of parallel systems

SIGMETRICS '87 Proceedings of the 1987 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The Effects of Problem Partitioning, Allocation, and Granularity on the Performance of Multiple-Processor Systems

IEEE Transactions on Computers
The 5 minute rule for trading memory for disc accesses and the 10 byte rule for trading memory for CPU time

SIGMOD '87 Proceedings of the 1987 ACM SIGMOD international conference on Management of data
Process and dataflow control in distributed data-intensive systems

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Comparison of dataflow control techniques in distributed data-intensive systems

SIGMETRICS '88 Proceedings of the 1988 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Optimal file designs and reorganization points

ACM Transactions on Database Systems (TODS)
Optimum reorganization points for linearly growing files

ACM Transactions on Database Systems (TODS)
Optimal allocation of resources in distributed information networks

ACM Transactions on Database Systems (TODS) - Special issue: papers from the international conference on very large data bases: September 22–24, 1975, Framingham, MA
A dynamic database reorganization algorithm

ACM Transactions on Database Systems (TODS)
The influence of parallel decomposition strategies on the performance of multiprocessor systems

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
The Operational Analysis of Queueing Network Models

ACM Computing Surveys (CSUR)
Database Reorganization—Principles and Practice

ACM Computing Surveys (CSUR)
Optimal reorganization of distributed space disk files

Communications of the ACM
Optimum data base reorganization points

Communications of the ACM
Computer Architecture and Parallel Processing

Computer Architecture and Parallel Processing
Tandem Database Group - NonStop SQL: A Distributed, High-Performance, High-Availability Implementation of SQL

Proceedings of the 2nd International Workshop on High Performance Transaction Systems
GAMMA - A High Performance Dataflow Database Machine

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
A measure of program locality and its application

SIGMETRICS '84 Proceedings of the 1984 ACM SIGMETRICS conference on Measurement and modeling of computer systems

Parallelism in bubba

DPDS '88 Proceedings of the first international symposium on Databases in parallel and distributed systems
Parallelizing a database programming language

DPDS '88 Proceedings of the first international symposium on Databases in parallel and distributed systems
A comparison of high-availability media recovery techniques

SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Declustering using error correcting codes

PODS '89 Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Office documents on a database kernel—filing, retrieval, and archiving

COCS '90 Proceedings of the ACM SIGOIS and IEEE CS TC-OA conference on Office information systems
Multiprocessor Algorithms for Relational-Database Operators on Hypercube Systems

Computer
Optimizing equijoin queries in distributed databases where relations are hash partitioned

ACM Transactions on Database Systems (TODS)
Dynamic file allocation in disk arrays

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Multi-disk B-trees

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Parallel database systems: the future of database processing or a passing fad?

ACM SIGMOD Record - Directions for future database research & development
Disk Allocation Methods Using Error Correcting Codes

IEEE Transactions on Computers
Parallel database systems: the future of high performance database systems

Communications of the ACM
Exploiting inter-operation parallelism in XPRS

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
A performance analysis of alternative multi-attribute declustering strategies

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
An efficient scheme for providing high availability

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
Doubly distorted mirrors

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
RAID: high-performance, reliable secondary storage

ACM Computing Surveys (CSUR)
The TickerTAIP parallel RAID architecture

ACM Transactions on Computer Systems (TOCS)
Management of disk space with REBATE

CIKM '94 Proceedings of the third international conference on Information and knowledge management
Predictive dynamic load balancing of parallel and distributed rule and query processing

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
On multimedia repositories, personal computers, and hierarchical storage systems

MULTIMEDIA '94 Proceedings of the second ACM international conference on Multimedia
Inverted File Partitioning Schemes in Multiple Disk Systems

IEEE Transactions on Parallel and Distributed Systems
DB2 parallel edition

IBM Systems Journal
Goal-oriented buffer management revisited

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Tuning databases for high performance

ACM Computing Surveys (CSUR)
Prefetching in segmented disk cache for multi-disk systems

Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
Browsing and placement of multiresolution images on parallel disks

Proceedings of the fifth workshop on I/O in parallel and distributed systems
On disk caching of Web objects in proxy servers

CIKM '97 Proceedings of the sixth international conference on Information and knowledge management
Snowball: Scalable Storage on Networks of Workstations with Balanced Load

Distributed and Parallel Databases
Cluster I/O with River: making the fast case common

Proceedings of the sixth workshop on I/O in parallel and distributed systems
Parallelism in relational data base systems: architectural issues and design approaches

DPDS '90 Proceedings of the second international symposium on Databases in parallel and distributed systems
Parallel handling of integrity constraints on fragmented relations

DPDS '90 Proceedings of the second international symposium on Databases in parallel and distributed systems
File Assignment in Parallel I/O Systems with Minimal Variance of Service Time

IEEE Transactions on Computers
Intensive Data Management in Parallel Systems: A Survey

Distributed and Parallel Databases
Workfile Disk Management for Concurrent Mergesorts in a Multiprocessor Database System

Distributed and Parallel Databases
The state of the art in distributed query processing

ACM Computing Surveys (CSUR)
GeMDA: A Multidimensional Data Partitioning Technique for Multiprocessor Database Systems

Distributed and Parallel Databases
Cache investment: integrating query optimization and distributed data placement

ACM Transactions on Database Systems (TODS)
PowerDB-IR: information retrieval on top of a database cluster

Proceedings of the tenth international conference on Information and knowledge management
Automating physical database design in a parallel database

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Run-time adaptation in river

ACM Transactions on Computer Systems (TOCS)
Affinity-based management of main memory database clusters

ACM Transactions on Internet Technology (TOIT)
Locking Performance in a Shared Nothing Parallel Database Machine

IEEE Transactions on Knowledge and Data Engineering
Prototyping Bubba, A Highly Parallel Database System

IEEE Transactions on Knowledge and Data Engineering
The Gamma Database Machine Project

IEEE Transactions on Knowledge and Data Engineering
A FAD for Data Intensive Applications

IEEE Transactions on Knowledge and Data Engineering
A Combined Method for Maintaining Large Indices in Multiprocessor Multidisk Environments

IEEE Transactions on Knowledge and Data Engineering
Scalability Analysis of Declustering Methods for Multidimensional Range Queries

IEEE Transactions on Knowledge and Data Engineering
Caching on the World Wide Web

IEEE Transactions on Knowledge and Data Engineering
A Virtual Bus Architecture for Dynamic Parallel Processing

IEEE Transactions on Parallel and Distributed Systems
Performance Issues in Distributed Query Processing

IEEE Transactions on Parallel and Distributed Systems
Control Versus Data Flow in Parallel Database Machines

IEEE Transactions on Parallel and Distributed Systems
MAGIC: A Multiattribute Declustering Mechanism for Multiprocessor Database Machines

IEEE Transactions on Parallel and Distributed Systems
Encapsulation of Parallelism and Architecture-Independence in Extensible Database Query Execution

IEEE Transactions on Software Engineering
Continuous Retrieval of Multimedia Data Using Parallelism

IEEE Transactions on Knowledge and Data Engineering
Data Allocation for Multidisk Databases

IEEE Transactions on Knowledge and Data Engineering
Multi-Dimensional Database Allocation for Parallel Data Warehouses

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
An Adaptive Data Placement Scheme for Parallel Database Computer Systems

VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
Hybrid-Range Partitioning Strategy: A New Declustering Strategy for Multiprocessor Database Machines

VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
Tradeoffs in Processing Complex Join Queries via Hashing in Multiprocessor Database Machines

VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
Object Placement in Parallel Hypermedia Systems

VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins

VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
CMD: A Multidimensional Declustering Method for Parallel Data Systems

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Parallelism in a Main-Memory DBMS: The Performance of PRISMA/DB

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Analysis of Dynamic Load Balancing Strategies for Parallel Shared Nothing Database Systems

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Managing Memory to Meet Multiclass Workload Response Time Goals

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Performance of Data-Parallel Spatial Operations

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A Non-Uniform Data Fragmentation Strategy for Parallel Main-Menory Database Systems

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Vertical Data Migration in Large Near-Line Document Archives Based on Markov-Chain Predictions

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
An Effective Data Placement Strategy for XML Documents

BNCOD 18 Proceedings of the 18th British National Conference on Databases: Advances in Databases
Achieving Robust, Scalable Cluster I/O in Java

LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
A Fast Convergence Technique for Online Heat-Balancing of Btree Indexed Database over Shared-Nothing Parallel Systems

DEXA '00 Proceedings of the 11th International Conference on Database and Expert Systems Applications
WATCHMAN: A Data Warehouse Intelligent Cache Manager

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Data partitioning and load balancing in parallel disk systems

The VLDB Journal — The International Journal on Very Large Data Bases
Integrated document caching and prefetching in storage hierarchies based on Markov-chain predictions

The VLDB Journal — The International Journal on Very Large Data Bases
Data placement in shared-nothing parallel database systems

The VLDB Journal — The International Journal on Very Large Data Bases
Mariposa: a wide-area distributed database system

The VLDB Journal — The International Journal on Very Large Data Bases
Modeling on-line rebalancing with priorities and executing on parallel database systems

CASCON '96 Proceedings of the 1996 conference of the Centre for Advanced Studies on Collaborative research
A case for fractured mirrors

The VLDB Journal — The International Journal on Very Large Data Bases
One torus to rule them all: multi-dimensional queries in P2P systems

Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Experimental evidence on partitioning in parallel data warehouses

Proceedings of the 7th ACM international workshop on Data warehousing and OLAP
C-store: a column-oriented DBMS

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Database replication policies for dynamic content applications

Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Self-tuning database technology and information services: from wishful thinking to viable engineering

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A case for fractured mirrors

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Experience report: exploiting advanced database optimization features for Large-Scale SAP R/3 installations

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
General store placement for response time minimization in parallel disks

Journal of Parallel and Distributed Computing
Online balancing of range-partitioned data with applications to peer-to-peer systems

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Bigtable: A Distributed Storage System for Structured Data

ACM Transactions on Computer Systems (TOCS)
Parallel Query Processing in Databases on Multicore Architectures

ICA3PP '08 Proceedings of the 8th international conference on Algorithms and Architectures for Parallel Processing
POEMS: Peer-Based Overload Management

WISE '08 Proceedings of the 9th international conference on Web Information Systems Engineering
MARK-OPT: A Concurrency Control Protocol for Parallel B-Tree Structures to Reduce the Cost of SMOs

IEICE - Transactions on Information and Systems
Online reorganization of databases

ACM Computing Surveys (CSUR)
Document-centric OLAP in the schema-chaos world

BIRTE'06 Proceedings of the 1st international conference on Business intelligence for the real-time enterprises
The architecture and implementation of an extensible web crawler

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
DYFRAM: dynamic fragmentation and replica management in distributed database systems

Distributed and Parallel Databases
Turbocharging DBMS buffer pool using SSDs

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Online reorganization in read optimized MMDBS

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Algorithms for the database layout problem

ICDT'05 Proceedings of the 10th international conference on Database Theory
An on-line reorganization framework for SAN file systems

ADBIS'06 Proceedings of the 10th East European conference on Advances in Databases and Information Systems
Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
An efficient overload control strategy in cloud

APWeb'12 Proceedings of the 14th international conference on Web Technologies and Applications
Satisfying quality requirements in the design of a partition-based, distributed stock trading system

Software—Practice & Experience
Executing web application queries on a partitioned database

WebApps'12 Proceedings of the 3rd USENIX conference on Web Application Development
The Yahoo!: cloud datastore load balancer

Proceedings of the fourth international workshop on Cloud data management
Cogset: a high performance MapReduce engine

Concurrency and Computation: Practice & Experience

Quantified Score

Hi-index	0.02

Visualization

Abstract

This paper examines the problem of data placement in Bubba, a highly-parallel system for data-intensive applications being developed at MCC. “Highly-parallel” implies that load balancing is a critical performance issue. “Data-intensive” means data is so large that operations should be executed where the data resides. As a result, data placement becomes a critical performance issue.In general, determining the optimal placement of data across processing nodes for performance is a difficult problem. We describe our heuristic approach to solving the data placement problem in Bubba. We then present experimental results using a specific workload to provide insight into the problem. Several researchers have argued the benefits of declustering (i e, spreading each base relation over many nodes). We show that as declustering is increased, load balancing continues to improve. However, for transactions involving complex joins, further declustering reduces throughput because of communications, startup and termination overhead.We argue that data placement, especially declustering, in a highly-parallel system must be considered early in the design, so that mechanisms can be included for supporting variable declustering, for minimizing the most significant overheads associated with large-scale declustering, and for gathering the required statistics.