Building predictors from vertically distributed data

Authors:
Sabine McConnell;David B. Skillicorn
Affiliations:
School of Computing, Queen's University, Kingston, Canada;School of Computing, Queen's University, Kingston, Canada
Venue:
CASCON '04 Proceedings of the 2004 conference of the Centre for Advanced Studies on Collaborative research
Year:
2004

Citing 8
Cited 4

Experiments on multistrategy learning by meta-learning

CIKM '93 Proceedings of the second international conference on Information and knowledge management
Distributed multivariate regression using wavelet-based collective data mining

Journal of Parallel and Distributed Computing - Special issue on high-performance data mining
Distributed clustering using collective principal component analysis

Knowledge and Information Systems
Neural Network Ensembles

IEEE Transactions on Pattern Analysis and Machine Intelligence
Privacy preserving association rule mining in vertically partitioned data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Building decision tree classifier on private data

CRPIT '14 Proceedings of the IEEE international conference on Privacy, security and data mining - Volume 14
Privacy-preserving k-means clustering over vertically partitioned data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Gene Expression and Fast Construction of Distributed Evolutionary Representation

Evolutionary Computation

Distributed higher order association rule mining using information extracted from textual data

ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Distributed prediction from vertically partitioned data

Journal of Parallel and Distributed Computing
Distributed data mining patterns and services: an architecture and experiments

Concurrency and Computation: Practice & Experience
Toward the scalability of neural networks through feature selection

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Due in part to the large volume of data available today, but more importantly to privacy concerns, data are often distributed across institutional, geographical and organizational boundaries rather than being stored in a centralized location. Data can be distributed by separating objects or attributes: in the homogeneous case, sites contain subsets of objects with all attributes, while in the heterogeneous case sites contain subsets of attributes for all objects. Ensemble approaches combine the results obtained from a number of classifiers to obtain a final classification. In this paper, we present a novel ensemble approach, in which data is partitioned by attributes. We show that this method can successfully be applied to a wide range of data and can even produce an increase in classification accuracy compared to a centralized technique. As an ensemble approach, our technique exchanges models or classification results instead of raw data, which makes it suitable for privacy preserving data mining. In addition, both final model size and runtime are typically reduced compared to a centralized model. The proposed technique is evaluated using a decision tree, a variety of datasets, and several voting schemes. This approach is suitable for physically distributed data as well as privacy preserving data mining.