Satrap: data and network heterogeneity aware P2P data-mining

Authors:
Hock Hee Ang;Vivekanand Gopalkrishnan;Anwitaman Datta;Wee Keong Ng;Steven C H. Hoi
Affiliations:
Nanyang Technological University, Singapore;Nanyang Technological University, Singapore;Nanyang Technological University, Singapore;Nanyang Technological University, Singapore;Nanyang Technological University, Singapore
Venue:
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Year:
2010

Citing 9
Cited 0

Pasting Small Votes for Classification in Large Databases and On-Line

Machine Learning
Invariance of neighborhood relation under input space to feature space mapping

Pattern Recognition Letters
Agent-based Service-Oriented Intelligent P2P Networks for Distributed Classification

ICHIT '06 Proceedings of the 2006 International Conference on Hybrid Information Technology - Volume 02
Distributed classification in peer-to-peer networks

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Distributed Decision-Tree Induction in Peer-to-Peer Systems

Statistical Analysis and Data Mining
Cascade RSVM in Peer-to-Peer Networks

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Automatic document organization in a p2p environment

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
A study on reduced support vector machines

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Distributed classification aims to build an accurate classifier by learning from distributed data while reducing computation and communication cost A P2P network where numerous users come together to share resources like data content, bandwidth, storage space and CPU resources is an excellent platform for distributed classification However, two important aspects of the learning environment have often been overlooked by other works, viz., 1) location of the peers which results in variable communication cost and 2) heterogeneity of the peers' data which can help reduce redundant communication In this paper, we examine the properties of network and data heterogeneity and propose a simple yet efficient P2P classification approach that minimizes expensive inter-region communication while achieving good generalization performance Experimental results demonstrate the feasibility and effectiveness of the proposed solution.