Parallelism in relational data base systems: architectural issues and design approaches

Authors:
Hamid Pirahesh;C. Mohan;Josephine Cheng;T. S. Liu;Pat Selinger
Affiliations:
Data Base Technology Institute, IBM Almaden Research Center, San Jose, CA;Data Base Technology Institute, IBM Almaden Research Center, San Jose, CA;Data Base Technology Institute, IBM Santa Teresa Laboratory, San Jose, CA;Data Base Technology Institute, IBM Santa Teresa Laboratory, San Jose, CA;Data Base Technology Institute, IBM Almaden Research Center, San Jose, CA
Venue:
DPDS '90 Proceedings of the second international symposium on Databases in parallel and distributed systems
Year:
1990

Citing 43
Cited 37

Transaction management in the R* distributed database management system

ACM Transactions on Database Systems (TODS)
Distributed Version Management for Read-Only Actions

IEEE Transactions on Software Engineering - Special issue on distributed systems
Structures for networks of systems

IBM Systems Journal
SNA: current requirements and direction

IBM Systems Journal
OS/2 EE database manager overview and technical highlights

IBM Systems Journal
Process and dataflow control in distributed data-intensive systems

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Data placement in Bubba

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
A case for redundant arrays of inexpensive disks (RAID)

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
A benchmark of NonStop SQL on the debit credit transaction

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
High performance SQL through low-level system integration

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
A performance analysis of the gamma database machine

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Handling hot spot data in DB-sharing systems

Information Systems
Sharing the load of logic-program evaluation

DPDS '88 Proceedings of the first international symposium on Databases in parallel and distributed systems
Parallelism in bubba

DPDS '88 Proceedings of the first international symposium on Databases in parallel and distributed systems
Parallelism in processing queries on complex objects

DPDS '88 Proceedings of the first international symposium on Databases in parallel and distributed systems
Integrated Concurrency-Coherency Controls for Multisystem Data Sharing

IEEE Transactions on Software Engineering
Extensible query processing in starburst

SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Modular synchronization in multiversion databases: version control and concurrency control

SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Single table access using multiple indexes: optimization, execution, and concurrency control techniques

EDBT '90 Proceedings of the 2nd international conference on extending database technology: Advances in Database Technology
A low communication sort algorithm for a parallel database machine

VLDB '89 Proceedings of the 15th international conference on Very large data bases
Percentile finding algorithm for multiple sorted runs

VLDB '89 Proceedings of the 15th international conference on Very large data bases
Random sampling from B+ trees

VLDB '89 Proceedings of the 15th international conference on Very large data bases
Encapsulation of parallelism in the Volcano query processing system

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging

ACM Transactions on Database Systems (TODS)
VAXcluster: a closely-coupled distributed system

ACM Transactions on Computer Systems (TOCS)
Using Semi-Joins to Solve Relational Queries

Journal of the ACM (JACM)
Access path selection in a relational database management system

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Log write-ahead protocols and IMS/VS logging

PODS '83 Proceedings of the 2nd ACM SIGACT-SIGMOD symposium on Principles of database systems
An Evaluation of Relational Join Algorithms in a Pipelined Query Processing Environment

IEEE Transactions on Software Engineering
System Issues in Parallel Sorting for Database Systems

Proceedings of the Sixth International Conference on Data Engineering
Chained Declustering: A New Availability Strategy for Multiprocessor Database Machines

Proceedings of the Sixth International Conference on Data Engineering
A Distributed Query Processing Strategy Using Decomposition, Pipelining and Intermediate Result Sharing Techniques

Proceedings of the Second International Conference on Data Engineering
Disk Striping

Proceedings of the Second International Conference on Data Engineering
Tandem Database Group - NonStop SQL: A Distributed, High-Performance, High-Availability Implementation of SQL

Proceedings of the 2nd International Workshop on High Performance Transaction Systems
ALCS - A High-Performance High-Availability DB/DC Monitor

Proceedings of the 2nd International Workshop on High Performance Transaction Systems
LU 6.2 as a Network Standard for Transaction Processing

Proceedings of the 2nd International Workshop on High Performance Transaction Systems
A Single-User Performance Evaluation of the Teradata Database Machine

Proceedings of the 2nd International Workshop on High Performance Transaction Systems
Robustness to Crash in a Distributed Database: A Non Shared-memory Multi-Processor Approach

VLDB '84 Proceedings of the 10th International Conference on Very Large Data Bases
An Analysis of Three Transaction Processing Architectures

VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
The Design of XPRS

VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
Notes on Data Base Operating Systems

Operating Systems, An Advanced Course
System R: A Relational Data Base Management System

Data Base Systems, Proceedings, 5th Informatik Symposium
NAMING AND SYNCHRONIZATION IN A DECENTRALIZED COMPUTER SYSTEM

NAMING AND SYNCHRONIZATION IN A DECENTRALIZED COMPUTER SYSTEM

Exploiting inter-operation parallelism in XPRS

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Efficient and flexible methods for transient versioning of records to avoid locking by read-only transactions

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Algorithms for creating indexes for very large tables without quiescing updates

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Performance analysis of dynamic finite versioning for concurrent transaction and query processing

SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Empirical performance evaluation of concurrency and coherency control protocols for database sharing systems

ACM Transactions on Database Systems (TODS)
Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
Parallel query processing in shared disk database systems

ACM SIGMOD Record
Parallelism in relational database management systems

IBM Systems Journal
An efficient multiversion algorithm for secure servicing of transaction reads

CCS '94 Proceedings of the 2nd ACM Conference on Computer and communications security
A Hierarchical Approach to Parallel Multiquery Scheduling

IEEE Transactions on Parallel and Distributed Systems
Scheduling problems in parallel query optimization

PODS '95 Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Parallel Execution of Hash Joins in Parallel Databases

IEEE Transactions on Parallel and Distributed Systems
Systems support for scalable data mining

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Load Balancing for Parallel Query Execution on NUMA Multiprocessors

Distributed and Parallel Databases
Multi-level transaction management for complex objects: implementation, performance, parallelism

The VLDB Journal — The International Journal on Very Large Data Bases
Parallel Hash-Based Join Algorithms for a Shared-Everything Environment

IEEE Transactions on Knowledge and Data Engineering
Applying Segmented Right-Deep Trees to Pipelining Multiple Hash Joins

IEEE Transactions on Knowledge and Data Engineering
Optimization of Parallel Execution for Multi-Join Queries

IEEE Transactions on Knowledge and Data Engineering
Performance Analysis of Dynamic Finite Versioning Schemes: Storage Cost vs. Obsolescence

IEEE Transactions on Knowledge and Data Engineering
Encapsulation of Parallelism and Architecture-Independence in Extensible Database Query Execution

IEEE Transactions on Software Engineering
A Requirements Analysis for Parallel KDD Systems

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Alert: An Architecture for Transforming a Passive DBMS into an Active DBMS

VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Using Segmented Right-Deep Trees for the Execution of Pipelined Hash Joins

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Analysis of Dynamic Load Balancing Strategies for Parallel Shared Nothing Database Systems

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Applying Hash Filters to Improving the Execution of Bushy Trees

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Managing Memory to Meet Multiclass Workload Response Time Goals

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Optimization Algorithms for Exploiting the Parallelism-Communication Tradeoff in Pipelined Parallelism

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Towards Automated Performance Tuning for Complex Workloads

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Coloring Away Communication in Parallel Query Optimization

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Memory Aware Query Routing in Interactive Web-Based Information Systems

BNCOD 18 Proceedings of the 18th British National Conference on Databases: Advances in Databases
Performance Analysis of Database Systems

Performance Evaluation: Origins and Directions
On applying hash filters to improving the execution of multi-join queries

The VLDB Journal — The International Journal on Very Large Data Bases
Supporting procedural constructs in existing SQL compilers

CASCON '96 Proceedings of the 1996 conference of the Centre for Advanced Studies on Collaborative research
Parallel Algorithms for the Execution of Relational Database Operations Revisited On Grids

International Journal of High Performance Computing Applications
Executing SPARQL Queries over the Web of Linked Data

ISWC '09 Proceedings of the 8th International Semantic Web Conference
Performance analysis of a parallel sort merge join on cluster architectures

ICA3PP'05 Proceedings of the 6th international conference on Algorithms and Architectures for Parallel Processing
Cluster recovery for fault tolerance of spatial database cluster in sensor networks

ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

With current systems, some important complex queries may take days to complete because of: (1) the volume of data to be processed, (2) limited aggregate resources. Introducing parallelism addresses the first problem. Cheaper, but powerful computing resources solve the second problem. According to a survey by Brodie,1 only 10% of computerized data is in data bases. This is an argument for both more variety and volume of data to be moved into data base systems. We conjecture that the primary reasons for this low percentage are that data base management systems (DBMSs) still need to provide far greater functionality and improved performance compared to a combination of application programs and file systems. This paper addresses the issues and solutions relating to intraquery parallelism in a relational DBMS supporting SQL. Instead of focussing only on a few algorithms for a subset of the problems, we provide a broad framework for the study of the numerous issues that need to be addressed in supporting parallelism efficiently and flexibly. We also discuss the impact that parallelization of complex queries has on short transactions which have stringent response time constraints. The pros and cons of the shared nothing, shared disks and shared everything architectures for parallelism are enumerated. The impact of parallelism on a number of components of an industrial-strength DBMS are pointed out. The different stages of query processing during which parallelism may be gainfully employed are identified. The interactions between parallelism and the traditional systems' pipelining technique are analyzed. Finally, the performance implications of parallelizing a specific complex query are studied. This gives us a range of sample points for different parameters of a parallel system architecture, namely, I/O and communication bandwidth as a function of aggregate MIPS.