A distributed learning framework for heterogeneous data sources

Authors:
Srujana Merugu;Joydeep Ghosh
Affiliations:
The University of Texas at Austin;The University of Texas at Austin
Venue:
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Year:
2005

Citing 11
Cited 4

Elements of information theory

Elements of information theory
Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
On the design and quantification of privacy preserving data mining algorithms

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Relative Loss Bounds for Multidimensional Regression Problems

Machine Learning
Parallel Optimization: Theory, Algorithms and Applications

Parallel Optimization: Theory, Algorithms and Applications
Privacy preserving mining of association rules

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A theoretical framework for learning from a pool of disparate data sources

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
On the Privacy Preserving Properties of Random Data Perturbation Techniques

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Privacy-preserving k-means clustering over vertically partitioned data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A privacy-sensitive approach to distributed clustering

Pattern Recognition Letters - Special issue: Advances in pattern recognition

Visual cube and on-line analytical processing of images

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Using knowledge integration techniques for user profile adaptation method in document retrieval systems

Transactions on computational collective intelligence V
CLAP: Collaborative pattern mining for distributed information systems

Decision Support Systems
Cluster ensembles

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a probabilistic model-based framework for distributed learning that takes into account privacy restrictions and is applicable to scenarios where the different sites have diverse, possibly overlapping subsets of features. Our framework decouples data privacy issues from knowledge integration issues by requiring the individual sites to share only privacy-safe probabilistic models of the local data, which are then integrated to obtain a global probabilistic model based on the union of the features available at all the sites. We provide a mathematical formulation of the model integration problem using the maximum likelihood and maximum entropy principles and describe iterative algorithms that are guaranteed to converge to the optimal solution. For certain commonly occurring special cases involving hierarchically ordered feature sets or conditional independence, we obtain closed form solutions and use these to propose an efficient alternative scheme by recursive decomposition of the model integration problem. To address interpretability concerns, we also present a modified formulation where the global model is assumed to belong to a specified parametric family. Finally, to highlight the generality of our framework, we provide empirical results for various learning tasks such as clustering and classification on different kinds of datasets consisting of continuous vector, categorical and directional attributes. The results show that high quality global models can be obtained without much loss of privacy.