Probabilistic counting algorithms for data base applications
Journal of Computer and System Sciences
SIAM Journal on Applied Mathematics
Epidemic algorithms for replicated database maintenance
PODC '87 Proceedings of the sixth annual ACM Symposium on Principles of distributed computing
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Chord: A scalable peer-to-peer lookup service for internet applications
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
A scalable content-addressable network
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Evaluating strategies for similarity search on the web
Proceedings of the 11th international conference on World Wide Web
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Estimating Answer Sizes for XML Queries
EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Index Structures for Path Expressions
ICDT '99 Proceedings of the 7th International Conference on Database Theory
Counting Twig Matches in a Tree
Proceedings of the 17th International Conference on Data Engineering
Estimating the Selectivity of XML Path Expressions for Internet Scale Applications
Proceedings of the 27th International Conference on Very Large Data Bases
Kademlia: A Peer-to-Peer Information System Based on the XOR Metric
IPTPS '01 Revised Papers from the First International Workshop on Peer-to-Peer Systems
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems
Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Gossip-Based Computation of Aggregate Information
FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Path sharing and predicate evaluation for high-performance XML filtering
ACM Transactions on Database Systems (TODS)
Data Indexing in Peer-to-Peer DHT Networks
ICDCS '04 Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS'04)
Selectivity Estimation for XML Twigs
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
XPath lookup queries in P2P networks
Proceedings of the 6th annual ACM international workshop on Web information and data management
IMAX: Incremental Maintenance of Schema-Based XML Statistics
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
LSH forest: self-tuning indexes for similarity search
WWW '05 Proceedings of the 14th international conference on World Wide Web
On the spread of viruses on the internet
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Gossip-based aggregation in large dynamic networks
ACM Transactions on Computer Systems (TOCS)
Peer-to-peer management of XML data: issues and research challenges
ACM SIGMOD Record
XSEED: Accurate and Fast Cardinality Estimation for XPath Queries
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
XCluster Synopses for Structured XML Content
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
SketchTree: Approximate Tree Pattern Counts over Streaming Labeled Trees
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Efficient gossip-based aggregate computation
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Understanding churn in peer-to-peer networks
Proceedings of the 6th ACM SIGCOMM conference on Internet measurement
Designing a DHT for low latency and high throughput
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Distributed Density Estimation Using Non-parametric Statistics
ICDCS '07 Proceedings of the 27th International Conference on Distributed Computing Systems
XPathLearner: an on-line self-tuning Markov histogram for XML path selectivity estimation
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Dynamo: amazon's highly available key-value store
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Locating data sources in large distributed systems
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
The promise, and limitations, of gossip protocols
ACM SIGOPS Operating Systems Review - Gossip-based computer networking
XRPC: interoperable and efficient distributed XQuery
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Multi-probe LSH: efficient indexing for high-dimensional similarity search
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
DXQ: a distributed XQuery scripting language
XIME-P '07 Proceedings of the 4th international workshop on XQuery implementation, experience and perspectives
On the complexity of asynchronous gossip
Proceedings of the twenty-seventh ACM symposium on Principles of distributed computing
XTreeNet: democratic community search
Proceedings of the VLDB Endowment
A sampling approach for XML query selectivity estimation
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Distributed similarity search in high dimensions using locality sensitive hashing
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Self-Join Size Estimation in Large-scale Distributed Data Systems
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
XML processing in DHT networks
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
An Internet-Scale Service for Publishing and Locating XML Documents
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Sketch-Based Summarization of Ordered XML Streams
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Foundations and Trends® in Networking
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Statistical structures for Internet-scale data management
The VLDB Journal — The International Journal on Very Large Data Bases
Locating XML Documents in a Peer-to-Peer Network Using Distributed Hash Tables
IEEE Transactions on Knowledge and Data Engineering
Gossip-based distribution estimation in peer-to-peer networks
IPTPS'08 Proceedings of the 7th international conference on Peer-to-peer systems
Identifying frequent items in a network using gossip
Journal of Parallel and Distributed Computing
Towards large-scale sharing of electronic health records of cancer patients
Proceedings of the 1st ACM International Health Informatics Symposium
Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium
Efficient processing of XPath queries with structured overlay networks
OTM'05 Proceedings of the 2005 OTM Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, COA, and ODBASE - Volume Part II
Fast Distributed Algorithms for Computing Separable Functions
IEEE Transactions on Information Theory
Tapestry: a resilient global-scale overlay for service deployment
IEEE Journal on Selected Areas in Communications
Hi-index | 0.00 |
In this paper, we address the problem of cardinality estimation of XPath queries over XML data stored in a distributed, Internet-scale environment such as a large-scale, data sharing system designed to foster innovations in biomedical and health informatics. The cardinality estimate of XPath expressions is useful in XQuery optimization, designing IR-style relevance ranking schemes, and statistical hypothesis testing. We present a novel gossip algorithm called XGossip, which given an XPath query estimates the number of XML documents in the network that contain a match for the query. XGossip is designed to be scalable, decentralized, and robust to failures--properties that are desirable in a large-scale distributed system. XGossip employs a novel divide-and-conquer strategy for load balancing and reducing the bandwidth consumption. We conduct theoretical analysis of XGossip in terms of accuracy of cardinality estimation, message complexity, and bandwidth consumption. We present a comprehensive performance evaluation of XGossip on Amazon EC2 using a heterogeneous collection of XML documents.