Probabilistic counting algorithms for data base applications
Journal of Computer and System Sciences
Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Searching distributed collections with inference networks
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
A decision-theoretic approach to database selection in networked IR
ACM Transactions on Information Systems (TOIS)
Min-wise independent permutations
Journal of Computer and System Sciences - 30th annual ACM symposium on theory of computing
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
Chord: A scalable peer-to-peer lookup service for internet applications
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
A scalable content-addressable network
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
k-RP*s: a scalable distributed data structure for high-performance multi-attribute access
DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Building efficient and effective metasearch engines
ACM Computing Surveys (CSUR)
A language modeling framework for resource selection and results merging
Proceedings of the eleventh international conference on Information and knowledge management
Computing Iceberg Queries Efficiently
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems
Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Content-based retrieval in hybrid peer-to-peer networks
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Evaluating GUESS and Non-Forwarding Peer-to-Peer Search
ICDCS '04 Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS'04)
Exploiting Semantic Proximity in Peer-to-Peer Content Searching
FTDCS '04 Proceedings of the 10th IEEE International Workshop on Future Trends of Distributed Computing Systems
Improving text collection selection with coverage and overlap statistics
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Improving collection selection with overlap awareness in P2P search engines
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Gossip-based aggregation in large dynamic networks
ACM Transactions on Computer Systems (TOCS)
Sketching streams through the net: distributed approximate query tracking
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Consistently estimating the selectivity of conjuncts of predicates
VLDB '05 Proceedings of the 31st international conference on Very large data bases
MINERVA: collaborative P2P search
VLDB '05 Proceedings of the 31st international conference on Very large data bases
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Efficient Query Evaluation on Large Textual Collections in a Peer-to-Peer Environment
P2P '05 Proceedings of the Fifth IEEE International Conference on Peer-to-Peer Computing
Answering similarity queries in peer-to-peer networks
Information Systems
Associative search in peer to peer networks: Harnessing latent semantics
Computer Networks: The International Journal of Computer and Telecommunications Networking
All of Statistics: A Concise Course in Statistical Inference
All of Statistics: A Concise Course in Statistical Inference
Attribute-Based access to distributed data over p2p networks
DNIS'05 Proceedings of the 4th international conference on Databases in Networked Information Systems
Semantic overlay networks for p2p systems
AP2PC'04 Proceedings of the Third international conference on Agents and Peer-to-Peer Computing
Web text retrieval with a P2P query-driven index
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Data allocation scheme based on term weight for P2P information retrieval
Proceedings of the 9th annual ACM international workshop on Web information and data management
CTO: concept tree based semantic overlay for pure peer-to-peer information retrieval
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Query-driven indexing for scalable peer-to-peer text retrieval
Proceedings of the 2nd international conference on Scalable information systems
Exploiting correlated keywords to improve approximate information filtering
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Query-driven indexing for scalable peer-to-peer text retrieval
Future Generation Computer Systems
Adaptive distributed indexing for structured peer-to-peer networks
Proceedings of the 17th ACM conference on Information and knowledge management
ACM Transactions on Computer Systems (TOCS)
A scalable and effective full-text search in P2P networks
Proceedings of the 18th ACM conference on Information and knowledge management
Statistical structures for Internet-scale data management
The VLDB Journal — The International Journal on Very Large Data Bases
DELOS'07 Proceedings of the 1st international conference on Digital libraries: research and development
An evaluation measure for distributed information retrieval systems
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Flood little, cache more: effective result-reuse in P2P IR systems
DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
PCIR: Combining DHTs and peer clusters for efficient full-text P2P indexing
Computer Networks: The International Journal of Computer and Telecommunications Networking
A peer-selection algorithm for information retrieval
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
KMV-peer: a robust and adaptive peer-selection algorithm
Proceedings of the fourth ACM international conference on Web search and data mining
Peer-to-peer web search: euphoria, achievements, disillusionment, and future opportunities
From active data management to event-based systems and more
On the dimensioning of an aggregation service for P2P service overlay networks
AIMS'11 Proceedings of the 5th international conference on Autonomous infrastructure, management, and security: managing the dynamics of networks and services
Probabilistic deduplication for cluster-based storage systems
Proceedings of the Third ACM Symposium on Cloud Computing
MinervaDL: an architecture for information retrieval and filtering in distributed digital libraries
ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
Hi-index | 0.00 |
Peer-to-Peer (P2P) search requires intelligent decisions for query routing: selecting the best peers to which a given query, initiated at some peer, should be forwarded for retrieving additional search results. These decisions are based on statistical summaries for each peer, which are usually organized on a per-keyword basis and managed in a distributed directory of routing indices. Such architectures disregard the possible correlations among keywords. Together with the coarse granularity of per-peer summaries, which are mandated for scalability, this limitation may lead to poor search result quality.This paper develops and evaluates two solutions to this problem, sk-STAT based on single-key statistics only, and mk-STAT based on additional multi-key statistics. For both cases, hash sketch synopses are used to compactly represent a peer's data items and are efficiently disseminated in the P2P network to form a decentralized directory. Experimental studies with Gnutella and Web data demonstrate the viability and the trade-offs of the approaches.