Introduction to data compression
Introduction to data compression
Efficient progressive sampling
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Comparison of neural networks and discriminant analysis in predicting forest cover types
Comparison of neural networks and discriminant analysis in predicting forest cover types
An agent-based framework for distributed learning
Engineering Applications of Artificial Intelligence
Cluster integration for the cluster-based instance selection
ICCCI'10 Proceedings of the Second international conference on Computational collective intelligence: technologies and applications - Volume PartI
A new cluster-based instance selection algorithm
KES-AMSTA'11 Proceedings of the 5th KES international conference on Agent and multi-agent systems: technologies and applications
ICCCI'11 Proceedings of the Third international conference on Computational collective intelligence: technologies and applications - Volume Part II
Hi-index | 0.00 |
The objective of data reduction is to obtain a compact representation of a large data set to facilitate repeated use of non-redundant information with complex and slow learning algorithms and to allow efficient data transfer and storage. For a user-controllable allowed accuracy loss we propose an effective data reduction procedure based on guided sampling for identifying a minimal size representative subset, followed by a model-sensitivity analysis for determining an appropriate compression level for each attribute. Experiments were performed on 3 large data sets and, depending on an allowed accuracy loss margin ranging from 1percnt to 5percnt of the ideal generalization, the achieved compression rates ranged between 95 and 12,500 times. These results indicate that transferring reduced data sets from multiple locations to a centralized site for an efficient and accurate knowledge discovery might often be possible in practice.