Locating data sources in large distributed systems

Authors:
Leonidas Galanis;Yuan Wang;Shawn R. Jeffery;David J. DeWitt
Affiliations:
Computer Sciences Department, University of Wisconsin - Madison, Madison, WI;Computer Sciences Department, University of Wisconsin - Madison, Madison, WI;Computer Sciences Department, University of Wisconsin - Madison, Madison, WI;Computer Sciences Department, University of Wisconsin - Madison, Madison, WI
Venue:
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Year:
2003

Citing 21
Cited 41

LH: Linear Hashing for distributed files

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Principles of distributed database systems (2nd ed.)

Principles of distributed database systems (2nd ed.)
The dangers of replication and a solution

Readings in database systems (3rd ed.)
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
OceanStore: an architecture for global-scale persistent storage

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Chord: A scalable peer-to-peer lookup service for internet applications

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
A scalable content-addressable network

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Wide-area cooperative storage with CFS

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
DNS and BIND

DNS and BIND
Estimating the Selectivity of XML Path Expressions for Internet Scale Applications

Proceedings of the 27th International Conference on Very Large Data Bases
Exploring the Design Space of Distributed and Peer-to-Peer Systems: Comparing the Web, TRIAD, and Chord/CFS

IPTPS '01 Revised Papers from the First International Workshop on Peer-to-Peer Systems
Complex Queries in DHT-based Peer-to-Peer Networks

IPTPS '01 Revised Papers from the First International Workshop on Peer-to-Peer Systems
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems

Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
ObjectGlobe: Ubiquitous query processing on the Internet

The VLDB Journal — The International Journal on Very Large Data Bases
Routing Indices For Peer-to-Peer Systems

ICDCS '02 Proceedings of the 22 nd International Conference on Distributed Computing Systems (ICDCS'02)
Improving Search in Peer-to-Peer Networks

ICDCS '02 Proceedings of the 22 nd International Conference on Distributed Computing Systems (ICDCS'02)
Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and

Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and
Theory, Volume 1, Queueing Systems

Theory, Volume 1, Queueing Systems
Structure and value synopses for XML data graphs

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Processing queries in a large peer-to-peer system

CAiSE'03 Proceedings of the 15th international conference on Advanced information systems engineering

XPath lookup queries in P2P networks

Proceedings of the 6th annual ACM international workshop on Web information and data management
Efficient query routing in distributed spatial databases

Proceedings of the 12th annual ACM international workshop on Geographic information systems
Guiding queries to information sources with InfoBeacons

Proceedings of the 5th ACM/IFIP/USENIX international conference on Middleware
Meghdoot: content-based publish/subscribe over P2P networks

Proceedings of the 5th ACM/IFIP/USENIX international conference on Middleware
Guaranteeing correctness and availability in P2P range indices

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Peer-to-peer management of XML data: issues and research challenges

ACM SIGMOD Record
Event-condition-action rules on RDF metadata in P2P environments

Computer Networks: The International Journal of Computer and Telecommunications Networking - Web dynamics
SIL: a model for analyzing scalable peer-to-peer search networks

Computer Networks: The International Journal of Computer and Telecommunications Networking
Storing and retrieving XPath fragments in structured P2P networks

Data & Knowledge Engineering - Special issue: WIDM 2004
Routing Queries through a Peer-to-Peer InfoBeacons Network Using Information Retrieval Techniques

IEEE Transactions on Parallel and Distributed Systems
Indexing views to route queries in a PDMS

Distributed and Parallel Databases
Net-χ: unified data-centric internet services

NETB'07 Proceedings of the 3rd USENIX international workshop on Networking meets databases
Xml data dissemination using automata on top of structured overlay networks

Proceedings of the 17th international conference on World Wide Web
Distributed databases and peer-to-peer databases: past and present

ACM SIGMOD Record
Cooperative XPath caching

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
WebContent: efficient P2P Warehousing of web data

Proceedings of the VLDB Endowment
Plexus: a scalable peer-to-peer protocol enabling efficient subset search

IEEE/ACM Transactions on Networking (TON)
Can RDB2RDF Tools Feasibily Expose Large Science Archives for Data Integration?

ESWC 2009 Heraklion Proceedings of the 6th European Semantic Web Conference on The Semantic Web: Research and Applications
Utilizing XML Clustering for Efficient XML Data Management on P2P Networks

DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
A query-strategy-focused taxonomy and a customizable benchmarking framework for peer-to-peer information retrieval techniques

KES'07/WIRN'07 Proceedings of the 11th international conference, KES 2007 and XVII Italian workshop on neural networks conference on Knowledge-based intelligent information and engineering systems: Part I
XlPPX: a lightweight framework for privacy preserving P2P XML databases in very large publish-subscribe systems

EC-Web'07 Proceedings of the 8th international conference on E-commerce and web technologies
XML query routing in structured P2P systems

DBISP2P'05/06 Proceedings of the 2005/2006 international conference on Databases, information systems, and peer-to-peer computing
eCube: hypercube event for efficient filtering in content-based routing

OTM'07 Proceedings of the 2007 OTM confederated international conference on On the move to meaningful internet systems: CoopIS, DOA, ODBASE, GADA, and IS - Volume Part II
Approximate XML query answers in DHT-based P2P networks

DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
Load-balanced query dissemination in privacy-aware online communities

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Selectivity-based XML query processing in structured peer-to-peer networks

Proceedings of the Fourteenth International Database Engineering & Applications Symposium
Towards large-scale sharing of electronic health records of cancer patients

Proceedings of the 1st ACM International Health Informatics Symposium
Polymorphic queries for P2P systems

Information Systems
P2P-based web text information retrieval

APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
Optimizing peer virtualization and load balancing

DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
A software tool for large-scale sharing and querying of clinical documents modeled using HL7 version 3 standard

Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium
A framework for distributed XML data management

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Scalable distributed aggregate computations through collaboration

DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
Using information retrieval techniques to route queries in an infobeacons network

DBISP2P'04 Proceedings of the Second international conference on Databases, Information Systems, and Peer-to-Peer Computing
Constructing and querying peer-to-peer warehouses of XML resources

SWDB'04 Proceedings of the Second international conference on Semantic Web and Databases
Efficient processing of XPath queries with structured overlay networks

OTM'05 Proceedings of the 2005 OTM Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, COA, and ODBASE - Volume Part II
FoXtrot: Distributed structural and value XML filtering

ACM Transactions on the Web (TWEB)
ViP2P: efficient XML management in DHT networks

ICWE'12 Proceedings of the 12th international conference on Web Engineering
Web data indexing in the cloud: efficiency and cost reductions

Proceedings of the 16th International Conference on Extending Database Technology
A new tool for sharing and querying of clinical documents modeled using HL7 Version 3 standard

Computer Methods and Programs in Biomedicine
A gossip-based approach for Internet-scale cardinality estimation of XPath queries over distributed semistructured data

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Querying large numbers of data sources is gaining importance due to increasing numbers of independent data providers. One of the key challenges is executing queries on all relevant information sources in a scalable fashion and retrieving fresh results. The key to scalability is to send queries only to the relevant servers and avoid wasting resources on data sources which will not provide any results. Thus, a catalog service, which would determine the relevant data sources given a query, is an essential component in efficiently processing queries in a distributed environment. This paper proposes a catalog framework which is distributed across the data sources themselves and does not require any central infrastructure. As new data sources become available, they automatically become part of the catalog service infrastructure, which allows scalability to large numbers of nodes. Furthermore, we propose techniques for workload adaptability. Using simulation and real-world data we show that our approach is valid and can scale to thousands of data sources.