Hybrid global-local indexing for effcient peer-to-peer information retrieval

Authors:
Chunqiang Tang;Sandhya Dwarkadas
Affiliations:
Computer Science Department, University of Rochester;Computer Science Department, University of Rochester
Venue:
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Year:
2004

Citing 26
Cited 58

Improving automatic query expansion

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
GlOSS: text-source discovery over the Internet

ACM Transactions on Database Systems (TODS)
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
A case for end system multicast (keynote address)

Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Feasibility of a serverless distributed file system deployed on an existing set of desktop PCs

Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A vector space model for automatic indexing

Communications of the ACM
Chord: A scalable peer-to-peer lookup service for internet applications

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Static index pruning for information retrieval systems

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Wide-area cooperative storage with CFS

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Enabling efficient content location and retrieval in peer-to-peer systems by exploiting locality in interests

ACM SIGCOMM Computer Communication Review
Search and replication in unstructured peer-to-peer networks

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Query processing and inverted indices in shared: nothing text document information retrieval systems

The VLDB Journal — The International Journal on Very Large Data Bases - Parallelism in database systems
Replication strategies in unstructured peer-to-peer networks

Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Distributed Pagerank for P2P Systems

HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities

HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
Routing Indices For Peer-to-Peer Systems

ICDCS '02 Proceedings of the 22 nd International Conference on Distributed Computing Systems (ICDCS'02)
SETS: search enhanced by topic segmentation

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Peer-to-peer information retrieval using self-organizing semantic overlay networks

Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
The impact of DHT routing geometry on resilience and proximity

Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Making gnutella-like P2P systems scalable

Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
High availability, scalable storage, dynamic peer networks: pick two

HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Optimized query execution in large search engines with global page ordering

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Efficient peer-to-peer keyword searching

Proceedings of the ACM/IFIP/USENIX 2003 International Conference on Middleware
Modeling Internet topology

IEEE Communications Magazine

Making Search Efficient on Gnutella-Like P2P Systems

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
A Category Overlay Infrastructure for Peer-to-Peer Content Search

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 8 - Volume 09
Low traffic overlay networks with large routing tables

SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
MINERVA: collaborative P2P search

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Scalable summary based retrieval in P2P networks

Proceedings of the 14th ACM international conference on Information and knowledge management
PRISM: indexing multi-dimensional data in P2P networks using reference vectors

Proceedings of the 13th annual ACM international conference on Multimedia
Grid resource discovery based on semantic P2P communities

Proceedings of the 2006 ACM symposium on Applied computing
Exploiting Geographical and Temporal Locality to Boost Search Efficiency in Peer-to-Peer Systems

IEEE Transactions on Parallel and Distributed Systems
Distributed cache table: efficient query-driven processing of multi-term queries in P2P networks

P2PIR '06 Proceedings of the international workshop on Information retrieval in peer-to-peer networks
Enhancing Search Performance on Gnutella-Like P2P Systems

IEEE Transactions on Parallel and Distributed Systems
Efficient semantic search on DHT overlays

Journal of Parallel and Distributed Computing
Resource-adaptive real-time new event detection

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Wildcard Search in Structured Peer-to-Peer Networks

IEEE Transactions on Knowledge and Data Engineering
Data allocation scheme based on term weight for P2P information retrieval

Proceedings of the 9th annual ACM international workshop on Web information and data management
SemreX: Efficient search in a semantic overlay for literature retrieval

Future Generation Computer Systems
Query-driven indexing for scalable peer-to-peer text retrieval

Proceedings of the 2nd international conference on Scalable information systems
Scalable keyword search based on semantic in DHT based peer-to-peer system

Proceedings of the 2nd international conference on Scalable information systems
Efficient multi-keyword search over p2p web

Proceedings of the 17th international conference on World Wide Web
Design and implementation trade-offs for wide-area resource discovery

ACM Transactions on Internet Technology (TOIT)
Query Processing to Efficient Search in Ubiquitous Computing

ICCS '07 Proceedings of the 7th international conference on Computational Science, Part IV: ICCS 2007
Achieving Effective Multi-term Queries for Fast DHT Information Retrieval

WISE '08 Proceedings of the 9th international conference on Web Information Systems Engineering
Adaptive indexing for content-based search in P2P systems

Data & Knowledge Engineering
Improving peer-to-peer performance through server-side scheduling

ACM Transactions on Computer Systems (TOCS)
Adaptive distributed indexing for structured peer-to-peer networks

Proceedings of the 17th ACM conference on Information and knowledge management
Peer-to-peer similarity search over widely distributed document collections

Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
Semantic routing of search queries in P2P networks

Journal of Parallel and Distributed Computing
Indexing through Querying in Unstructured Peer-to-Peer Overlay Networks

APNOMS '08 Proceedings of the 11th Asia-Pacific Symposium on Network Operations and Management: Challenges for Next Generation Network Operations and Service Management
A Novel Content Distribution Mechanism in DHT Networks

NETWORKING '09 Proceedings of the 8th International IFIP-TC 6 Networking Conference
BloomCast: Efficient Full-Text Retrieval over Unstructured P2Ps with Guaranteed Recall

CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Range Query Using Learning-Aware RPS in DHT-Based Peer-to-Peer Networks

CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Two-Dimensional Distributed Inverted Files

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Aggregation of Document Frequencies in Unstructured P2P Networks

WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
Handling very large numbers of messages in distributed hash tables

COMSNETS'09 Proceedings of the First international conference on COMmunication Systems And NETworks
Searching for information in a P2P system

International Journal of Computers and Applications
Distance-based bloom filter for an efficient search in mobile ad hoc networks

Proceedings of the 2007 conference on Human interface: Part I
Keyword search in DHT-based peer-to-peer networks

ICA3PP'07 Proceedings of the 7th international conference on Algorithms and architectures for parallel processing
Efficient search technique for agent-based P2P information retrieval

AIS-ADM'07 Proceedings of the 2nd international conference on Autonomous intelligent systems: agents and data mining
Usage-aware search in peer-to-peer systems

EUC'07 Proceedings of the 2007 international conference on Embedded and ubiquitous computing
Distributed ranked search

HiPC'07 Proceedings of the 14th international conference on High performance computing
Keyword searching in structured overlays via content distance addressing

DBISP2P'05/06 Proceedings of the 2005/2006 international conference on Databases, information systems, and peer-to-peer computing
A comparative study of pub/sub methods in structured P2P networks

DBISP2P'05/06 Proceedings of the 2005/2006 international conference on Databases, information systems, and peer-to-peer computing
PeerLearning: A Content-Based e-Learning Material Sharing System Based on P2P Network

World Wide Web
A hybrid approach for estimating document frequencies in unstructured P2P networks

Information Systems
HAPS: supporting effective and efficient full-text P2P search with peer dynamics

Journal of Computer Science and Technology
Three pillars for congenial web searching: continuous evaluation for enhancing web search effectiveness

Journal of Web Engineering
A distributed full-text top-k document dissemination system in distributed hash tables

World Wide Web
FAST: Friends Augmented Search Techniques - System Design & Data-Management Issues

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Review: A survey on content-centric technologies for the current Internet: CDN and P2P solutions

Computer Communications
STAIRS: Towards efficient full-text filtering and dissemination in DHT environments

The VLDB Journal — The International Journal on Very Large Data Bases
Clustering peers based on contents for efficient similarity search

DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
KEYNOTE: keyword search by node selection for text retrieval on DHT-Based P2P networks

DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
Grid resource discovery using semantic communities

GCC'05 Proceedings of the 4th international conference on Grid and Cooperative Computing
Arpeggio: metadata searching and content sharing with chord

IPTPS'05 Proceedings of the 4th international conference on Peer-to-Peer Systems
OverCite: a cooperative digital research library

IPTPS'05 Proceedings of the 4th international conference on Peer-to-Peer Systems
Wayfinder: navigating and sharing information in a decentralized world

DBISP2P'04 Proceedings of the Second international conference on Databases, Information Systems, and Peer-to-Peer Computing
Peer-to-Peer Information Retrieval: An Overview

ACM Transactions on Information Systems (TOIS)
3D inverted index with cache sharing for web search engines

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Evaluating continuous top-k queries over document streams

World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Content-based full-text search still remains a particularly challenging problem in peer-to-peer (P2P) systems. Traditionally, there have been two index partitioning structures--partitioning based on the document space or partitioning based on keywords. The former requires search of every node in the system to answer a query whereas the latter transmits a large amount of data when processing multi-term queries. In this paper, we propose eSearch--a P2P keyword search system based on a novel hybrid indexing structure. In eSearch, each node is responsible for certain terms. Given a document, eSearch uses a modern information retrieval algorithm to select a small number of top (important) terms in the document and publishes the complete term list for the document to nodes responsible for those top terms. This selective replication of term lists allows a multi-term query to proceed local to the nodes responsible for query terms. We also propose automatic query expansion to alleviate the degradation of quality of search results due to the selective replication, overlay source multicast to reduce the cost of disseminating term lists, and techniques to balance term list distribution across nodes. eSearch is scalable and efficient, and obtains search results as good as state-of-the-art centralized systems. Despite the use of replication, eSearch actually consumes less bandwidth than systems based on keyword partitioning when publishing metadata for a document. During a retrieval operation, it searches only a small number of nodes and typically transmits a small amount of data (3.3KB) that is independent of the size of the corpus and grows slowly (logarithmically) with the number of nodes in the system. eSearch's efficiency comes at a modest storage cost, 6.8 times that of systems based on keyword partitioning. This cost can be further reduced by adopting index compression or pruning techniques.