Experiments on multistrategy learning by meta-learning
CIKM '93 Proceedings of the second international conference on Information and knowledge management
Distributed multivariate regression using wavelet-based collective data mining
Journal of Parallel and Distributed Computing - Special issue on high-performance data mining
Distributed clustering using collective principal component analysis
Knowledge and Information Systems
IEEE Transactions on Pattern Analysis and Machine Intelligence
Privacy preserving association rule mining in vertically partitioned data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Building decision tree classifier on private data
CRPIT '14 Proceedings of the IEEE international conference on Privacy, security and data mining - Volume 14
Privacy-preserving k-means clustering over vertically partitioned data
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Gene Expression and Fast Construction of Distributed Evolutionary Representation
Evolutionary Computation
Distributed higher order association rule mining using information extracted from textual data
ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Distributed prediction from vertically partitioned data
Journal of Parallel and Distributed Computing
Distributed data mining patterns and services: an architecture and experiments
Concurrency and Computation: Practice & Experience
Toward the scalability of neural networks through feature selection
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
Due in part to the large volume of data available today, but more importantly to privacy concerns, data are often distributed across institutional, geographical and organizational boundaries rather than being stored in a centralized location. Data can be distributed by separating objects or attributes: in the homogeneous case, sites contain subsets of objects with all attributes, while in the heterogeneous case sites contain subsets of attributes for all objects. Ensemble approaches combine the results obtained from a number of classifiers to obtain a final classification. In this paper, we present a novel ensemble approach, in which data is partitioned by attributes. We show that this method can successfully be applied to a wide range of data and can even produce an increase in classification accuracy compared to a centralized technique. As an ensemble approach, our technique exchanges models or classification results instead of raw data, which makes it suitable for privacy preserving data mining. In addition, both final model size and runtime are typically reduced compared to a centralized model. The proposed technique is evaluated using a decision tree, a variety of datasets, and several voting schemes. This approach is suitable for physically distributed data as well as privacy preserving data mining.