Probabilistic counting algorithms for data base applications
Journal of Computer and System Sciences
Searching distributed collections with inference networks
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Min-wise independent permutations (extended abstract)
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
GlOSS: text-source discovery over the Internet
ACM Transactions on Database Systems (TODS)
Summary cache: a scalable wide-area web cache sharing protocol
IEEE/ACM Transactions on Networking (TON)
Min-wise independent permutations
Journal of Computer and System Sciences - 30th annual ACM symposium on theory of computing
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
Chord: A scalable peer-to-peer lookup service for internet applications
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
A scalable content-addressable network
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Building efficient and effective metasearch engines
ACM Computing Surveys (CSUR)
Novelty and redundancy detection in adaptive filtering
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
IEEE/ACM Transactions on Networking (TON)
A language modeling framework for resource selection and results merging
Proceedings of the eleventh international conference on Information and knowledge management
Improving Data Access in P2P Systems
IEEE Internet Computing
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems
Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Evaluating different methods of estimating retrieval quality for resource selection
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Processing set expressions over continuous update streams
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
An indexing framework for peer-to-peer systems
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Efficient top-K query calculation in distributed networks
Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
Informed content delivery across adaptive overlay networks
IEEE/ACM Transactions on Networking (TON)
Improving text collection selection with coverage and overlap statistics
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Improving collection selection with overlap awareness in P2P search engines
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
KLEE: a framework for distributed top-k query algorithms
VLDB '05 Proceedings of the 31st international conference on Very large data bases
MINERVA: collaborative P2P search
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Efficient Query Evaluation on Large Textual Collections in a Peer-to-Peer Environment
P2P '05 Proceedings of the Fifth IEEE International Conference on Peer-to-Peer Computing
Querying the internet with PIER
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Computing pagerank in a distributed internet search system
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Efficient peer-to-peer keyword searching
Proceedings of the ACM/IFIP/USENIX 2003 International Conference on Middleware
Attribute-Based access to distributed data over p2p networks
DNIS'05 Proceedings of the 4th international conference on Databases in Networked Information Systems
Towards a common framework for peer-to-peer web retrieval
From Integrated Publication and Information Systems to Virtual Information and Knowledge Environments
DL meets p2p – distributed document retrieval based on classification and content
ECDL'05 Proceedings of the 9th European conference on Research and Advanced Technology for Digital Libraries
The database research group at the Max-Planck Institute for Informatics
ACM SIGMOD Record
SRI: exploiting semantic information for effective query routing in a PDMS
WIDM '06 Proceedings of the 8th annual ACM international workshop on Web information and data management
Efficient peer-to-peer semantic overlay networks based on statistical language models
P2PIR '06 Proceedings of the international workshop on Information retrieval in peer-to-peer networks
Size doesn't always matter: exploiting pageRank for query routing in distributed IR
P2PIR '06 Proceedings of the international workshop on Information retrieval in peer-to-peer networks
p2pDating: Real life inspired semantic overlay networks for Web search
Information Processing and Management: an International Journal
Efficient query routing by improved peer description in P2P networks
Proceedings of the 3rd international conference on Scalable information systems
Machine learning in disruption-tolerant MANETs
ACM Transactions on Autonomous and Adaptive Systems (TAAS)
SRI@work: efficient and effective routing strategies in a PDMS
WISE'07 Proceedings of the 8th international conference on Web information systems engineering
Decentralized and autonomous content overlay networking (DACON) with WiFi access points
Proceedings of the 5th International Conference on Future Internet Technologies
Peer-to-peer web search: euphoria, achievements, disillusionment, and future opportunities
From active data management to event-based systems and more
Peer-to-Peer Information Retrieval: An Overview
ACM Transactions on Information Systems (TOIS)
Hi-index | 0.00 |
We consider a collaboration of peers autonomously crawling the Web. A pivotal issue when designing a peer-to-peer (P2P) Web search engine in this environment is query routing: selecting a small subset of (a potentially very large number of relevant) peers to contact to satisfy a keyword query. Existing approaches for query routing work well on disjoint data sets. However, naturally, the peers’ data collections often highly overlap, as popular documents are highly crawled. Techniques for estimating the cardinality of the overlap between sets, designed for and incorporated into information retrieval engines are very much lacking. In this paper we present a comprehensive evaluation of appropriate overlap estimators, showing how they can be incorporated into an efficient, iterative approach to query routing, coined Integrated Quality Novelty (IQN). We propose to further enhance our approach using histograms, combining overlap estimation with the available score/ranking information. Finally, we conduct a performance evaluation in MINERVA, our prototype P2P Web search engine.