Quantitative system performance: computer system analysis using queueing network models
Quantitative system performance: computer system analysis using queueing network models
ACM Transactions on Database Systems (TODS)
A measure of transaction processing power
Datamation
Multi-disk management algorithms
SIGMETRICS '87 Proceedings of the 1987 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A workload characterization pipeline for models of parallel systems
SIGMETRICS '87 Proceedings of the 1987 ACM SIGMETRICS conference on Measurement and modeling of computer systems
IEEE Transactions on Computers
SIGMOD '87 Proceedings of the 1987 ACM SIGMOD international conference on Management of data
Process and dataflow control in distributed data-intensive systems
SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Comparison of dataflow control techniques in distributed data-intensive systems
SIGMETRICS '88 Proceedings of the 1988 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Optimal file designs and reorganization points
ACM Transactions on Database Systems (TODS)
Optimum reorganization points for linearly growing files
ACM Transactions on Database Systems (TODS)
Optimal allocation of resources in distributed information networks
ACM Transactions on Database Systems (TODS) - Special issue: papers from the international conference on very large data bases: September 22–24, 1975, Framingham, MA
A dynamic database reorganization algorithm
ACM Transactions on Database Systems (TODS)
The influence of parallel decomposition strategies on the performance of multiprocessor systems
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
The Operational Analysis of Queueing Network Models
ACM Computing Surveys (CSUR)
Database Reorganization—Principles and Practice
ACM Computing Surveys (CSUR)
Optimal reorganization of distributed space disk files
Communications of the ACM
Optimum data base reorganization points
Communications of the ACM
Computer Architecture and Parallel Processing
Computer Architecture and Parallel Processing
Proceedings of the 2nd International Workshop on High Performance Transaction Systems
GAMMA - A High Performance Dataflow Database Machine
VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
A measure of program locality and its application
SIGMETRICS '84 Proceedings of the 1984 ACM SIGMETRICS conference on Measurement and modeling of computer systems
DPDS '88 Proceedings of the first international symposium on Databases in parallel and distributed systems
Parallelizing a database programming language
DPDS '88 Proceedings of the first international symposium on Databases in parallel and distributed systems
A comparison of high-availability media recovery techniques
SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Declustering using error correcting codes
PODS '89 Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Office documents on a database kernel—filing, retrieval, and archiving
COCS '90 Proceedings of the ACM SIGOIS and IEEE CS TC-OA conference on Office information systems
Optimizing equijoin queries in distributed databases where relations are hash partitioned
ACM Transactions on Database Systems (TODS)
Dynamic file allocation in disk arrays
SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Parallel database systems: the future of database processing or a passing fad?
ACM SIGMOD Record - Directions for future database research & development
Disk Allocation Methods Using Error Correcting Codes
IEEE Transactions on Computers
Parallel database systems: the future of high performance database systems
Communications of the ACM
Exploiting inter-operation parallelism in XPRS
SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
A performance analysis of alternative multi-attribute declustering strategies
SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
An efficient scheme for providing high availability
SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Query evaluation techniques for large databases
ACM Computing Surveys (CSUR)
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
RAID: high-performance, reliable secondary storage
ACM Computing Surveys (CSUR)
The TickerTAIP parallel RAID architecture
ACM Transactions on Computer Systems (TOCS)
Management of disk space with REBATE
CIKM '94 Proceedings of the third international conference on Information and knowledge management
Predictive dynamic load balancing of parallel and distributed rule and query processing
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
On multimedia repositories, personal computers, and hierarchical storage systems
MULTIMEDIA '94 Proceedings of the second ACM international conference on Multimedia
Inverted File Partitioning Schemes in Multiple Disk Systems
IEEE Transactions on Parallel and Distributed Systems
IBM Systems Journal
Goal-oriented buffer management revisited
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Tuning databases for high performance
ACM Computing Surveys (CSUR)
Prefetching in segmented disk cache for multi-disk systems
Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
Browsing and placement of multiresolution images on parallel disks
Proceedings of the fifth workshop on I/O in parallel and distributed systems
On disk caching of Web objects in proxy servers
CIKM '97 Proceedings of the sixth international conference on Information and knowledge management
Snowball: Scalable Storage on Networks of Workstations with Balanced Load
Distributed and Parallel Databases
Cluster I/O with River: making the fast case common
Proceedings of the sixth workshop on I/O in parallel and distributed systems
Parallelism in relational data base systems: architectural issues and design approaches
DPDS '90 Proceedings of the second international symposium on Databases in parallel and distributed systems
Parallel handling of integrity constraints on fragmented relations
DPDS '90 Proceedings of the second international symposium on Databases in parallel and distributed systems
File Assignment in Parallel I/O Systems with Minimal Variance of Service Time
IEEE Transactions on Computers
Intensive Data Management in Parallel Systems: A Survey
Distributed and Parallel Databases
Workfile Disk Management for Concurrent Mergesorts in a Multiprocessor Database System
Distributed and Parallel Databases
The state of the art in distributed query processing
ACM Computing Surveys (CSUR)
GeMDA: A Multidimensional Data Partitioning Technique for Multiprocessor Database Systems
Distributed and Parallel Databases
Cache investment: integrating query optimization and distributed data placement
ACM Transactions on Database Systems (TODS)
PowerDB-IR: information retrieval on top of a database cluster
Proceedings of the tenth international conference on Information and knowledge management
Automating physical database design in a parallel database
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
ACM Transactions on Computer Systems (TOCS)
Affinity-based management of main memory database clusters
ACM Transactions on Internet Technology (TOIT)
Locking Performance in a Shared Nothing Parallel Database Machine
IEEE Transactions on Knowledge and Data Engineering
Prototyping Bubba, A Highly Parallel Database System
IEEE Transactions on Knowledge and Data Engineering
The Gamma Database Machine Project
IEEE Transactions on Knowledge and Data Engineering
A FAD for Data Intensive Applications
IEEE Transactions on Knowledge and Data Engineering
A Combined Method for Maintaining Large Indices in Multiprocessor Multidisk Environments
IEEE Transactions on Knowledge and Data Engineering
Scalability Analysis of Declustering Methods for Multidimensional Range Queries
IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Knowledge and Data Engineering
A Virtual Bus Architecture for Dynamic Parallel Processing
IEEE Transactions on Parallel and Distributed Systems
Performance Issues in Distributed Query Processing
IEEE Transactions on Parallel and Distributed Systems
Control Versus Data Flow in Parallel Database Machines
IEEE Transactions on Parallel and Distributed Systems
MAGIC: A Multiattribute Declustering Mechanism for Multiprocessor Database Machines
IEEE Transactions on Parallel and Distributed Systems
Encapsulation of Parallelism and Architecture-Independence in Extensible Database Query Execution
IEEE Transactions on Software Engineering
Continuous Retrieval of Multimedia Data Using Parallelism
IEEE Transactions on Knowledge and Data Engineering
Data Allocation for Multidisk Databases
IEEE Transactions on Knowledge and Data Engineering
Multi-Dimensional Database Allocation for Parallel Data Warehouses
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
An Adaptive Data Placement Scheme for Parallel Database Computer Systems
VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
Hybrid-Range Partitioning Strategy: A New Declustering Strategy for Multiprocessor Database Machines
VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
Tradeoffs in Processing Complex Join Queries via Hashing in Multiprocessor Database Machines
VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
Object Placement in Parallel Hypermedia Systems
VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins
VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
CMD: A Multidimensional Declustering Method for Parallel Data Systems
VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Parallelism in a Main-Memory DBMS: The Performance of PRISMA/DB
VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Analysis of Dynamic Load Balancing Strategies for Parallel Shared Nothing Database Systems
VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Managing Memory to Meet Multiclass Workload Response Time Goals
VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Performance of Data-Parallel Spatial Operations
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A Non-Uniform Data Fragmentation Strategy for Parallel Main-Menory Database Systems
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Vertical Data Migration in Large Near-Line Document Archives Based on Markov-Chain Predictions
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
An Effective Data Placement Strategy for XML Documents
BNCOD 18 Proceedings of the 18th British National Conference on Databases: Advances in Databases
Achieving Robust, Scalable Cluster I/O in Java
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
DEXA '00 Proceedings of the 11th International Conference on Database and Expert Systems Applications
WATCHMAN: A Data Warehouse Intelligent Cache Manager
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Data partitioning and load balancing in parallel disk systems
The VLDB Journal — The International Journal on Very Large Data Bases
Integrated document caching and prefetching in storage hierarchies based on Markov-chain predictions
The VLDB Journal — The International Journal on Very Large Data Bases
Data placement in shared-nothing parallel database systems
The VLDB Journal — The International Journal on Very Large Data Bases
Mariposa: a wide-area distributed database system
The VLDB Journal — The International Journal on Very Large Data Bases
Modeling on-line rebalancing with priorities and executing on parallel database systems
CASCON '96 Proceedings of the 1996 conference of the Centre for Advanced Studies on Collaborative research
The VLDB Journal — The International Journal on Very Large Data Bases
One torus to rule them all: multi-dimensional queries in P2P systems
Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Experimental evidence on partitioning in parallel data warehouses
Proceedings of the 7th ACM international workshop on Data warehousing and OLAP
C-store: a column-oriented DBMS
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Database replication policies for dynamic content applications
Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
General store placement for response time minimization in parallel disks
Journal of Parallel and Distributed Computing
Online balancing of range-partitioned data with applications to peer-to-peer systems
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Bigtable: A Distributed Storage System for Structured Data
ACM Transactions on Computer Systems (TOCS)
Parallel Query Processing in Databases on Multicore Architectures
ICA3PP '08 Proceedings of the 8th international conference on Algorithms and Architectures for Parallel Processing
POEMS: Peer-Based Overload Management
WISE '08 Proceedings of the 9th international conference on Web Information Systems Engineering
MARK-OPT: A Concurrency Control Protocol for Parallel B-Tree Structures to Reduce the Cost of SMOs
IEICE - Transactions on Information and Systems
Online reorganization of databases
ACM Computing Surveys (CSUR)
Document-centric OLAP in the schema-chaos world
BIRTE'06 Proceedings of the 1st international conference on Business intelligence for the real-time enterprises
The architecture and implementation of an extensible web crawler
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
DYFRAM: dynamic fragmentation and replica management in distributed database systems
Distributed and Parallel Databases
Turbocharging DBMS buffer pool using SSDs
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Online reorganization in read optimized MMDBS
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Algorithms for the database layout problem
ICDT'05 Proceedings of the 10th international conference on Database Theory
An on-line reorganization framework for SAN file systems
ADBIS'06 Proceedings of the 10th East European conference on Advances in Databases and Information Systems
Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
An efficient overload control strategy in cloud
APWeb'12 Proceedings of the 14th international conference on Web Technologies and Applications
Satisfying quality requirements in the design of a partition-based, distributed stock trading system
Software—Practice & Experience
Executing web application queries on a partitioned database
WebApps'12 Proceedings of the 3rd USENIX conference on Web Application Development
The Yahoo!: cloud datastore load balancer
Proceedings of the fourth international workshop on Cloud data management
Cogset: a high performance MapReduce engine
Concurrency and Computation: Practice & Experience
Hi-index | 0.02 |
This paper examines the problem of data placement in Bubba, a highly-parallel system for data-intensive applications being developed at MCC. “Highly-parallel” implies that load balancing is a critical performance issue. “Data-intensive” means data is so large that operations should be executed where the data resides. As a result, data placement becomes a critical performance issue.In general, determining the optimal placement of data across processing nodes for performance is a difficult problem. We describe our heuristic approach to solving the data placement problem in Bubba. We then present experimental results using a specific workload to provide insight into the problem. Several researchers have argued the benefits of declustering (i e, spreading each base relation over many nodes). We show that as declustering is increased, load balancing continues to improve. However, for transactions involving complex joins, further declustering reduces throughput because of communications, startup and termination overhead.We argue that data placement, especially declustering, in a highly-parallel system must be considered early in the design, so that mechanisms can be included for supporting variable declustering, for minimizing the most significant overheads associated with large-scale declustering, and for gathering the required statistics.