Benefit and cost of query answering in PDMS

Authors:
Armin Roth;Felix Naumann
Affiliations:
Humboldt-Universität zu Berlin, Berlin, Germany;Humboldt-Universität zu Berlin, Berlin, Germany
Venue:
DBISP2P'05/06 Proceedings of the 2005/2006 international conference on Databases, information systems, and peer-to-peer computing
Year:
2005

Citing 7
Cited 3

A Scalable Algorithm for Answering Queries Using Views

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
The chatty web: emergent semantics through gossiping

WWW '03 Proceedings of the 12th international conference on World Wide Web
Representing and reasoning about mappings between domain models

Eighteenth national conference on Artificial intelligence
Efficient query reformulation in peer data management systems

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Completeness of integrated information sources

Information Systems - Special issue: Data quality in cooperative information systems
Logical foundations of peer-to-peer data integration

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Information integration in schema-based peer-to-peer networks

CAiSE'03 Proceedings of the 15th international conference on Advanced information systems engineering

Networked PIM using PDMS

NETB'07 Proceedings of the 3rd USENIX international workshop on Networking meets databases
A research agenda for query processing in large-scale peer data management systems

Information Systems
Polymorphic queries for P2P systems

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Peer data management systems (PDMS) are a natural extension to integrated information systems. They consist of a dynamic set of autonomous peers, each of which can mediate between heterogenous schemas of other peers. A new data source joins a PDMS by defining a semantic mapping to one or more other peers, thus forming a network of peers. Queries submitted to a peer are answered with data residing at that peer and by data that is reached along paths of mappings through the network of peers. However, without optimization methods query reformulation in PDMS is very inefficient due to redundancy in mapping paths. We present a decentral strategy that guides peers in their decision along which further mappings the query should be sent. The strategy uses statistics of the peers own data and statistics of mappings to neighboring peers to predict whether it is worthwhile to send the query to that neighbor-- or whether the query plan should be pruned at this point. These decisions are guided by a benefit and cost model, trading off the amount of data a neighbor will pass back, and the execution cost of that step. Thus, we allow a high scale-up of PDMS in the number of participating peers.