Estimating peer similarity using distance of shared files

Authors:
Yuval Shavitt;Ela Weinsberg;Udi Weinsberg
Affiliations:
Tel-Aviv University, Israel;Tel-Aviv University, Israel;Tel-Aviv University, Israel
Venue:
IPTPS'10 Proceedings of the 9th international conference on Peer-to-peer systems
Year:
2010

Citing 8
Cited 2

Recommender systems

Communications of the ACM
A Note on the Complexity of Dijkstra's Algorithm for Graphs with Weighted Vertices

IEEE Transactions on Computers
A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
[15] Peer-to-Peer Architecture Case Study: Gnutella Network

P2P '01 Proceedings of the First International Conference on Peer-to-Peer Computing
Exploiting Semantic Proximity in Peer-to-Peer Content Searching

FTDCS '04 Proceedings of the 10th IEEE International Workshop on Future Trends of Distributed Computing Systems
Hyperspaces for object clustering and approximate matching in peer-to-peer overlays

HOTOS'07 Proceedings of the 11th USENIX workshop on Hot topics in operating systems
Song Clustering Using Peer-to-Peer Co-occurrences

ISM '09 Proceedings of the 2009 11th IEEE International Symposium on Multimedia
On next-generation telco-managed P2P TV architectures

IPTPS'08 Proceedings of the 7th international conference on Peer-to-peer systems

Building recommendation systems using peer-to-peer shared content

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Measuring Relatedness Between Scientific Entities in Annotation Datasets

Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Peer-to-Peer (p2p) networks are used by millions of users for sharing content. As these networks become ever more popular, it becomes increasingly difficult to find useful content in the abundance of shared files. Modern p2p networks and similar social services must adopt new methods to help users efficiently locate content, and to this end approximate meta-data search and recommendation systems are utilized. However, meta-data is often missing or wrong, and recommender systems are not fitted to handle p2p networks due to inherent difficulties such as implicit ranking, noise in user generated content and the extreme dimensions and sparseness of the network. This paper attempts to bridge this gap by suggesting a new metric for peer similarity, which can be used to improve content search and recommendation in large scale p2p networks and semi-centralized services, such as p2p IPTV. Unlike commonly used vector distance functions, which is shown to be unfitted for p2p networks due to low overlap between peers, this work leverages a file similarity graph for estimating the similarity between peers that have little or no overlap of shared files. Using 100k peers sharing over 500k songs in the Gnutella network, we show the advantages of the proposed metric over commonly used geographical locality and vector distance measures.