Discovering and exploiting keyword and attribute-value co-occurrences to improve P2P routing indices

Authors:
Sebastian Michel;Matthias Bender;Nikos Ntarmos;Peter Triantafillou;Gerhard Weikum;Christian Zimmer
Affiliations:
Max-Planck-Institut für Informatik, Saarbrücken, Germany;Max-Planck-Institut für Informatik, Saarbrücken, Germany;RACTI and University of Patras, Rio, Greece;RACTI and University of Patras, Rio, Greece;Max-Planck-Institut für Informatik, Saarbrücken, Germany;Max-Planck-Institut für Informatik, Saarbrücken, Germany
Venue:
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Year:
2006

Citing 32
Cited 21

Probabilistic counting algorithms for data base applications

Journal of Computer and System Sciences
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Searching distributed collections with inference networks

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
A decision-theoretic approach to database selection in networked IR

ACM Transactions on Information Systems (TOIS)
Min-wise independent permutations

Journal of Computer and System Sciences - 30th annual ACM symposium on theory of computing
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
Chord: A scalable peer-to-peer lookup service for internet applications

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
A scalable content-addressable network

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
k-RP*s: a scalable distributed data structure for high-performance multi-attribute access

DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Building efficient and effective metasearch engines

ACM Computing Surveys (CSUR)
A language modeling framework for resource selection and results merging

Proceedings of the eleventh international conference on Information and knowledge management
Computing Iceberg Queries Efficiently

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems

Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
Randomized rumor spreading

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
On the Resemblance and Containment of Documents

SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Content-based retrieval in hybrid peer-to-peer networks

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Evaluating GUESS and Non-Forwarding Peer-to-Peer Search

ICDCS '04 Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS'04)
Exploiting Semantic Proximity in Peer-to-Peer Content Searching

FTDCS '04 Proceedings of the 10th IEEE International Workshop on Future Trends of Distributed Computing Systems
Improving text collection selection with coverage and overlap statistics

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Improving collection selection with overlap awareness in P2P search engines

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Gossip-based aggregation in large dynamic networks

ACM Transactions on Computer Systems (TOCS)
Sketching streams through the net: distributed approximate query tracking

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Consistently estimating the selectivity of conjuncts of predicates

VLDB '05 Proceedings of the 31st international conference on Very large data bases
MINERVA: collaborative P2P search

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Semantic overlay networks

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Efficient Query Evaluation on Large Textual Collections in a Peer-to-Peer Environment

P2P '05 Proceedings of the Fifth IEEE International Conference on Peer-to-Peer Computing
Answering similarity queries in peer-to-peer networks

Information Systems
Associative search in peer to peer networks: Harnessing latent semantics

Computer Networks: The International Journal of Computer and Telecommunications Networking
All of Statistics: A Concise Course in Statistical Inference

All of Statistics: A Concise Course in Statistical Inference
Attribute-Based access to distributed data over p2p networks

DNIS'05 Proceedings of the 4th international conference on Databases in Networked Information Systems
Semantic overlay networks for p2p systems

AP2PC'04 Proceedings of the Third international conference on Agents and Peer-to-Peer Computing

Web text retrieval with a P2P query-driven index

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Data allocation scheme based on term weight for P2P information retrieval

Proceedings of the 9th annual ACM international workshop on Web information and data management
CTO: concept tree based semantic overlay for pure peer-to-peer information retrieval

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Query-driven indexing for scalable peer-to-peer text retrieval

Proceedings of the 2nd international conference on Scalable information systems
Exploiting correlated keywords to improve approximate information filtering

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Query-driven indexing for scalable peer-to-peer text retrieval

Future Generation Computer Systems
Adaptive distributed indexing for structured peer-to-peer networks

Proceedings of the 17th ACM conference on Information and knowledge management
Distributed hash sketches: Scalable, efficient, and accurate cardinality estimation for distributed multisets

ACM Transactions on Computer Systems (TOCS)
A scalable and effective full-text search in P2P networks

Proceedings of the 18th ACM conference on Information and knowledge management
Statistical structures for Internet-scale data management

The VLDB Journal — The International Journal on Very Large Data Bases
Efficient search and approximate information filtering in a distributed peer-to-peer environment of digital libraries

DELOS'07 Proceedings of the 1st international conference on Digital libraries: research and development
An evaluation measure for distributed information retrieval systems

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Flood little, cache more: effective result-reuse in P2P IR systems

DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
PCIR: Combining DHTs and peer clusters for efficient full-text P2P indexing

Computer Networks: The International Journal of Computer and Telecommunications Networking
A peer-selection algorithm for information retrieval

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
KMV-peer: a robust and adaptive peer-selection algorithm

Proceedings of the fourth ACM international conference on Web search and data mining
Peer-to-peer web search: euphoria, achievements, disillusionment, and future opportunities

From active data management to event-based systems and more
On the dimensioning of an aggregation service for P2P service overlay networks

AIMS'11 Proceedings of the 5th international conference on Autonomous infrastructure, management, and security: managing the dynamics of networks and services
Probabilistic deduplication for cluster-based storage systems

Proceedings of the Third ACM Symposium on Cloud Computing
MinervaDL: an architecture for information retrieval and filtering in distributed digital libraries

ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
P2P-based resource discovery in dynamic grids allowing multi-attribute and range queries

Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Peer-to-Peer (P2P) search requires intelligent decisions for query routing: selecting the best peers to which a given query, initiated at some peer, should be forwarded for retrieving additional search results. These decisions are based on statistical summaries for each peer, which are usually organized on a per-keyword basis and managed in a distributed directory of routing indices. Such architectures disregard the possible correlations among keywords. Together with the coarse granularity of per-peer summaries, which are mandated for scalability, this limitation may lead to poor search result quality.This paper develops and evaluates two solutions to this problem, sk-STAT based on single-key statistics only, and mk-STAT based on additional multi-key statistics. For both cases, hash sketch synopses are used to compactly represent a peer's data items and are efficiently disseminated in the P2P network to form a decentralized directory. Experimental studies with Gnutella and Web data demonstrate the viability and the trade-offs of the approaches.