Enhancing P2P file-sharing with an internet-scale query processor

Authors:
Boon Thau Loo;Joseph M. Hellerstein;Ryan Huebsch;Scott Shenker;Ion Stoica
Affiliations:
UC, Berkeley;UC, Berkeley and Intel Research, Berkeley;UC, Berkeley;UC, Berkeley and International Computer Science Institute;UC, Berkeley
Venue:
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Year:
2004

Citing 17
Cited 38

Principles of distributed database systems (2nd ed.)

Principles of distributed database systems (2nd ed.)
Chord: A scalable peer-to-peer lookup service for internet applications

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
A scalable content-addressable network

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Compressed bloom filters

Proceedings of the twentieth annual ACM symposium on Principles of distributed computing
Looking up data in P2P systems

Communications of the ACM
Generalized Partial Indexes

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Complex Queries in DHT-based Peer-to-Peer Networks

IPTPS '01 Revised Papers from the First International Workshop on Peer-to-Peer Systems
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems

Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
Mariposa: a wide-area distributed database system

The VLDB Journal — The International Journal on Very Large Data Bases
Routing Indices For Peer-to-Peer Systems

ICDCS '02 Proceedings of the 22 nd International Conference on Distributed Computing Systems (ICDCS'02)
Improving Search in Peer-to-Peer Networks

ICDCS '02 Proceedings of the 22 nd International Conference on Distributed Computing Systems (ICDCS'02)
Making gnutella-like P2P systems scalable

Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and

Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and
Should we build Gnutella on a structured overlay?

ACM SIGCOMM Computer Communication Review
Handling churn in a DHT

ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Querying the internet with PIER

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
The case for a hybrid p2p search infrastructure

IPTPS'04 Proceedings of the Third international conference on Peer-to-Peer Systems

XPath lookup queries in P2P networks

Proceedings of the 6th annual ACM international workshop on Web information and data management
On search in peer-to-peer file sharing systems

Proceedings of the 2005 ACM symposium on Applied computing
Creating social networks to improve peer-to-peer networking

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Delay aware querying with seaweed

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Distributed cache table: efficient query-driven processing of multi-term queries in P2P networks

P2PIR '06 Proceedings of the international workshop on Information retrieval in peer-to-peer networks
Novel applications of information retrieval techniques to peer-to-peer file-sharing systems

P2PIR '06 Proceedings of the international workshop on Information retrieval in peer-to-peer networks
Storing and retrieving XPath fragments in structured P2P networks

Data & Knowledge Engineering - Special issue: WIDM 2004
Survey of research towards robust peer-to-peer networks: search methods

Computer Networks: The International Journal of Computer and Telecommunications Networking
Debunking some myths about structured and unstructured overlays

NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
Routing Queries through a Peer-to-Peer InfoBeacons Network Using Information Retrieval Techniques

IEEE Transactions on Parallel and Distributed Systems
Efficient multi-keyword search over p2p web

Proceedings of the 17th international conference on World Wide Web
Distributed databases and peer-to-peer databases: past and present

ACM SIGMOD Record
Just-in-time query retrieval over partially indexed data on structured P2P overlays

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
An Architecture for Hybrid P2P Free-Text Search

CIA '07 Proceedings of the 11th international workshop on Cooperative Information Agents XI
Speed up semantic search in p2p networks

Proceedings of the 17th ACM conference on Information and knowledge management
PHIRST: A distributed architecture for P2P information retrieval

Information Systems
Popularity adaptive search in hybrid P2P systems

Journal of Parallel and Distributed Computing
Online Querying of Concept Hierarchies in P2P Systems

OTM '08 Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part I on On the Move to Meaningful Internet Systems:
An optimal overlay topology for routing peer-to-peer searches

Proceedings of the ACM/IFIP/USENIX 2005 International Conference on Middleware
DHTCache: A Distributed Service to Improve the Selection of Cache Configurations within a Highly-Distributed Context

Globe '09 Proceedings of the 2nd International Conference on Data Management in Grid and Peer-to-Peer Systems
Linking identical neighborly partitions for efficient high-dimensional similarity search in unstructured peer-to-peer systems

Distributed and Parallel Databases
Structured flooding search in chord overlays

GIIS'09 Proceedings of the Second international conference on Global Information Infrastructure Symposium
LINP: supporting similarity search in unstructured peer-to-peer networks

APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
Distributed ranked search

HiPC'07 Proceedings of the 14th international conference on High performance computing
The declarative imperative: experiences and conjectures in distributed logic

ACM SIGMOD Record
Proactive replication and search for rare objects in unstructured peer-to-peer networks

WAIM'10 Proceedings of the 11th international conference on Web-age information management
Online querying of d-dimensional hierarchies

Journal of Parallel and Distributed Computing
TI: an efficient indexing mechanism for real-time search on tweets

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Brown Dwarf: A fully-distributed, fault-tolerant data warehousing system

Journal of Parallel and Distributed Computing
Review: A survey on content-centric technologies for the current Internet: CDN and P2P solutions

Computer Communications
Proactive replication for rare objects in unstructured peer-to-peer networks

Journal of Network and Computer Applications
Searching dynamic communities with personal indexes

ISWC'05 Proceedings of the 4th international conference on The Semantic Web
Clustering peers based on contents for efficient similarity search

DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
Distributed proximity-aware peer clustering in bittorrent-like peer-to-peer networks

EUC'06 Proceedings of the 2006 international conference on Embedded and Ubiquitous Computing
Efficient processing of XPath queries with structured overlay networks

OTM'05 Proceedings of the 2005 OTM Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, COA, and ODBASE - Volume Part II
A timed mobile agent planning approach for distributed information retrieval in dynamic network environments

Information Sciences: an International Journal
An optimal overlay topology for routing peer-to-peer searches

Middleware'05 Proceedings of the ACM/IFIP/USENIX 6th international conference on Middleware
High volumes of event stream indexing and efficient multi-keyword searching for cloud monitoring

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we address the problem of designing a scalable, accurate query processor for peer-to-peer filesharing and similar distributed keyword search systems. Using a globally-distributed monitoring infrastructure, we perform an extensive study of the Gnutella filesharing network, characterizing its topology, data and query workloads. We observe that Gnutella's query processing approach performs well for popular content, but quite poorly for rare items with few replicas. We then consider an alternate approach based on Distributed Hash Tables (DHTs). We describe our implementation of PIERSearch, a DHT-based system, and propose a hybrid system where Gnutella is used to locate popular items, and PIERSearch for handling rare items. We develop an analytical model of the two approaches, and use it in concert with our Gnutella traces to study the trade-off between query recall and system overhead of the hybrid system. We evaluate a variety of localized schemes for identifying items that are rare and worth handling via the DHT. Lastly, we show in a live deployment on fifty nodes on two continents that it nicely complements Gnutella in its ability to handle rare items.