XML processing in DHT networks

Authors:
Serge Abiteboul;Ioana Manolescu;Neoklis Polyzotis;Nicoleta Preda;Chong Sun
Affiliations:
INRIA Futurs&University of Paris XI, Gemo Team, 4 rue Jacques Monod, Orsay Cedex, 91893, France. serge.abiteboul@inria.fr;INRIA Futurs&University of Paris XI, Gemo Team, 4 rue Jacques Monod, Orsay Cedex, 91893, France. ioana.manolescu@inria.fr;Computer Science Departament, University of California, Santa Cruz, 1156 High St, Santa Cruz, CA 95064, United States. alkis@cs.ucsc.edu;INRIA Futurs&University of Paris XI, Gemo Team, 4 rue Jacques Monod, Orsay Cedex, 91893, France. nicoleta.preda@inria.fr;Computer Science Departament, University of California, Santa Cruz, 1156 High St, Santa Cruz, CA 95064, United States. sunchong@soe.ucsc.edu
Venue:
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Year:
2008

Citing 0
Cited 16

WebContent: efficient P2P Warehousing of web data

Proceedings of the VLDB Endowment
Routing of structured queries in large-scale distributed systems

Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
Optimized union of non-disjoint distributed data sets

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
LCA-based selection for XML document collections

Proceedings of the 19th international conference on World wide web
Selectivity-based XML query processing in structured peer-to-peer networks

Proceedings of the Fourteenth International Database Engineering & Applications Symposium
Cardinality estimation and dynamic length adaptation for Bloom filters

Distributed and Parallel Databases
Towards large-scale sharing of electronic health records of cancer patients

Proceedings of the 1st ACM International Health Informatics Symposium
ASTERIX: towards a scalable, semistructured data platform for evolving-world models

Distributed and Parallel Databases
Collaborative clustering of XML documents

Journal of Computer and System Sciences
A distributed full-text top-k document dissemination system in distributed hash tables

World Wide Web
A software tool for large-scale sharing and querying of clinical documents modeled using HL7 version 3 standard

Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium
FoXtrot: Distributed structural and value XML filtering

ACM Transactions on the Web (TWEB)
ViP2P: efficient XML management in DHT networks

ICWE'12 Proceedings of the 12th international conference on Web Engineering
Web data indexing in the cloud: efficiency and cost reductions

Proceedings of the 16th International Conference on Extending Database Technology
A new tool for sharing and querying of clinical documents modeled using HL7 Version 3 standard

Computer Methods and Programs in Biomedicine
A gossip-based approach for Internet-scale cardinality estimation of XPath queries over distributed semistructured data

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the scalable management of XML data in P2P networks based on distributed hash tables (DHTs). We identify performance limitations in this context, and propose an array of techniques to lift them. First, we adapt the DHT platform's index store and communication primitives to the needs of massive data processing. Second, we introduce a distributed hierarchical index and associated efficient algorithms to speed up query processing. Third, we present an innovative, XML-specific flavor of Bloom filters, to reduce data transfers entailed by query processing. Our approach is fully implemented in the KadoP system, used in a real-life software manufacturing application. Our experiments demonstrate the benefits of the proposed techniques.