ACM SIGIR Forum
GlOSS: text-source discovery over the Internet
ACM Transactions on Database Systems (TODS)
Chord: A scalable peer-to-peer lookup service for internet applications
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Search and replication in unstructured peer-to-peer networks
ICS '02 Proceedings of the 16th international conference on Supercomputing
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Making gnutella-like P2P systems scalable
Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Keyword Search in DHT-Based Peer-to-Peer Networks
ICDCS '05 Proceedings of the 25th IEEE International Conference on Distributed Computing Systems
Thumbs up?: sentiment classification using machine learning techniques
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Enhancing P2P file-sharing with an internet-scale query processor
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Efficient peer-to-peer keyword searching
Proceedings of the ACM/IFIP/USENIX 2003 International Conference on Middleware
The case for a hybrid p2p search infrastructure
IPTPS'04 Proceedings of the Third international conference on Peer-to-Peer Systems
Tapestry: a resilient global-scale overlay for service deployment
IEEE Journal on Selected Areas in Communications
Crawling BitTorrent DHTs for fun and profit
WOOT'10 Proceedings of the 4th USENIX conference on Offensive technologies
Hi-index | 0.00 |
Recent advances in peer to peer (P2P) search algorithms have presented viable structured and unstructured approaches for full-text search. We posit that these existing approaches are each best suited for different types of queries. We present PHIRST, the first system to facilitate effective full-text search within P2P networks. PHIRST works by effectively leveraging between the relative strengths of these approaches. Similar to structured approaches, agents first publish terms within their stored documents. However, frequent terms are quickly identified and not exhaustively stored, resulting in a significantly reduction in the system's storage requirements. During query lookup, agents use unstructured searches to compensate for the lack of fully published terms. Additionally, they explicitly weigh between the costs involved with structured and unstructured approaches, allowing for a significant reduction in query costs. We evaluated the effectiveness of our approach using both real-world and artificial queries. We found that in most situations our approach yields near perfect recall. We discuss the limitations of our system, as well as possible compensatory strategies.