Concept decompositions for large sparse text data using clustering
Machine Learning
A scalable content-addressable network
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Super-peer-based routing and clustering strategies for RDF-based peer-to-peer networks
WWW '03 Proceedings of the 12th international conference on World Wide Web
PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities
HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
Peer-to-peer information retrieval using self-organizing semantic overlay networks
Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Remindin': semantic query routing in peer-to-peer networks based on social metaphors
Proceedings of the 13th international conference on World Wide Web
NCA '04 Proceedings of the Network Computing and Applications, Third IEEE International Symposium
Efficient Query Evaluation on Large Textual Collections in a Peer-to-Peer Environment
P2P '05 Proceedings of the Fifth IEEE International Conference on Peer-to-Peer Computing
M-Chord: a scalable distributed similarity search structure
InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Full-text federated search of text-based digital libraries in peer-to-peer networks
Information Retrieval
Efficient peer-to-peer semantic overlay networks based on statistical language models
P2PIR '06 Proceedings of the international workshop on Information retrieval in peer-to-peer networks
Hybrid global-local indexing for effcient peer-to-peer information retrieval
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Optimizing Peer Relationships in a Super-Peer Network
ICDCS '07 Proceedings of the 27th International Conference on Distributed Computing Systems
Peer-to-peer similarity search in metric spaces
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Processing complex similarity queries in peer-to-peer networks
Proceedings of the 2008 ACM symposium on Applied computing
Query-driven indexing for scalable peer-to-peer text retrieval
Proceedings of the 2nd international conference on Scalable information systems
MINERVA∞: a scalable efficient peer-to-peer search engine
Proceedings of the ACM/IFIP/USENIX 2005 International Conference on Middleware
Content-based similarity search over peer-to-peer systems
DBISP2P'04 Proceedings of the Second international conference on Databases, Information Systems, and Peer-to-Peer Computing
Semantic overlay networks for p2p systems
AP2PC'04 Proceedings of the Third international conference on Agents and Peer-to-Peer Computing
DESENT: decentralized and distributed semantic overlay generation in P2P networks
IEEE Journal on Selected Areas in Communications
Scalability of findability: effective and efficient IR operations in large information networks
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Cluster-K+: Network topology for searching replicated data in p2p systems
Information Processing and Management: an International Journal
Studying the clustering paradox and scalability of search in highly distributed environments
ACM Transactions on Information Systems (TOIS)
Hi-index | 0.00 |
This paper addresses the challenging problem of similarity search over widely distributed ultra-high dimensional data. Such an application is retrieval of the top-k most similar documents in a widely distributed document collection, as in the case of digital libraries. Peer-to-peer (P2P) systems emerge as a promising solution to delve with content management in cases of highly distributed data collections. We propose a self-organizing P2P approach in which an unstructured P2P network evolves into a super-peer architecture, with super-peers responsible for peers with similar content. Our approach is based on distributed clustering of peer contents, thus managing to create high quality clusters that span the entire network. More importantly, we show how to efficiently process similarity queries capitalizing on the newly constructed, clustered super-peer network. During query processing, the query is propagated only to few carefully selected super-peers that are able to return results of high quality. We evaluate the performance of our approach and demonstrate its advantages through simulation experiments on two document collections.