MOCHA: a self-extensible database middleware system for distributed data sources

Authors:
Manuel Rodríguez-Martínez;Nick Roussopoulos
Affiliations:
Department of Computer Science, University of Maryland, College Park;Department of Computer Science, University of Maryland, College Park
Venue:
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Year:
2000

Citing 12
Cited 30

Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
The SEQUOIA 2000 storage benchmark

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Predicate migration: optimizing queries with expensive predicates

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Performance tradeoffs for client-server query processing

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Secure and portable database extensibility

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Access path selection in a relational database management system

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Don't Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Optimizing Queries Across Diverse Data Sources

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
The Case for Enhanced Abstract Data Types

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
R* Optimizer Validation and Performance Evaluation for Distributed Queries

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
Optimization of Queries with User-defined Predicates

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Scaling heterogeneous databases and the design of Disco

ICDCS '96 Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS '96)

Cache investment: integrating query optimization and distributed data placement

ACM Transactions on Database Systems (TODS)
Performance evaluation of combining data migration and method migration in object database environments

ADC '02 Proceedings of the 13th Australasian database conference - Volume 5
A WFS-based mediation system for GIS interoperability

Proceedings of the 10th ACM international symposium on Advances in geographic information systems
Processing large-scale multi-dimensional data in parallel and distributed environments

Parallel Computing - Parallel data-intensive algorithms and applications
Efficient Manipulation of Large Datasets on Heterogeneous Storage Systems

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Efficient Code Deployment for Heterogeneous Distributed Data Sources

ADVIS '02 Proceedings of the Second International Conference on Advances in Information Systems
QuDAS: A QoS-Based Brokering Architecture for Data Services

DBTel '01 Proceedings of the VLDB 2001 International Workshop on Databases in Telecommunications II
Exploiting and Completing Web Data Sources Capabilities

Proceedings of the VLDB 2002 Workshop EEXTT and CAiSE 2002 Workshop DTWeb on Efficiency and Effectiveness of XML Tools and Techniques and Data Integration over the Web-Revised Papers
Efficient Querying of Distributed Resources in Mediator Systems

On the Move to Meaningful Internet Systems, 2002 - DOA/CoopIS/ODBASE 2002 Confederated International Conferences DOA, CoopIS and ODBASE 2002
Combining Mediator and Data Warehouse Technologies for Developing Environmental Decision Support Systems

GIScience '02 Proceedings of the Second International Conference on Geographic Information Science
Active Proxy-G: optimizing the query execution process in the grid

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Database management issues in the web environment

Effective databases for text & document management
Optimizing the Execution of Multiple Data Analysis Queries on Parallel and Distributed Environments

IEEE Transactions on Parallel and Distributed Systems
CoDIMS-G: a data and program integration service for the grid

MGC '04 Proceedings of the 2nd workshop on Middleware for grid computing
Building Scalable Electronic Market Places Using HyperQuery-Based Distributed Query Processing

World Wide Web
Bio-Broker: a tool for integration of biological data sources and data analysis tools

Software—Practice & Experience
Multiple range query optimization with distributed cache indexing

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
A Framework for Learning from Distributed Data Using Sufficient Statistics and Its Application to Learning Decision Trees

International Journal of Hybrid Intelligent Systems
A grid-based approach for enterprise-scale data mining

Future Generation Computer Systems - Special section: Data mining in grid computing environments
A grid-based approach for enterprise-scale data mining

Future Generation Computer Systems - Special section: Data mining in grid computing environments
Active semantic caching to optimize multidimensional data analysis in parallel and distributed environments

Parallel Computing
Form-based proxy caching for database-backed web sites: keywords and functions

The VLDB Journal — The International Journal on Very Large Data Bases
Toward automatic parallelization of spatial computation for computing clusters

HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Enabling OLAP in mobile environments via intelligent data cube compression techniques

Journal of Intelligent Information Systems
Multiple query scheduling for distributed semantic caches

Journal of Parallel and Distributed Computing
Power-aware operator placement and broadcasting of continuous query results

Proceedings of the Ninth ACM International Workshop on Data Engineering for Wireless and Mobile Access
Catalogue manager for metadata dissemination in the NetTraveler middleware system

International Journal of Intelligent Information and Database Systems
Algorithms and software for collaborative discovery from autonomous, semantically heterogeneous, distributed information sources

ALT'05 Proceedings of the 16th international conference on Algorithmic Learning Theory
An agent-based approach for cooperative data management

APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
The case for mobile OLAP

EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present MOCHA, a new self-extensible database middleware system designed to interconnect distributed data sources. MOCHA is designed to scale to large environments and is based on the idea that some of the user-defined functionality in the system should be deployed by the middleware system itself. This is realized by shipping Java code implementing either advanced data types or tailored query operators to remote data sources and have it executed remotely. Optimized query plans push the evaluation of powerful data-reducing operators to the data source sites while executing data-inflating operators near the client's site. The Volume Reduction Factor is a new and more explicit metric introduced in this paper to select the best site to execute query operators and is shown to be more accurate than the standard selectivity factor alone. MOCHA has been implemented in Java and runs on top of Informix and Oracle. We present the architecture of MOCHA, the ideas behind it, and a performance study using scientific data and queries. The results of this study demonstrate that MOCHA provides a more flexible, scalable and efficient framework for distributed query processing compared to those in existing middleware solutions.