Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
C4.5: programs for machine learning
C4.5: programs for machine learning
Learning in the presence of concept drift and hidden contexts
Machine Learning
Machine Learning - Special issue on context sensitivity and concept drift
Reduction Techniques for Instance-BasedLearning Algorithms
Machine Learning
A streaming ensemble algorithm (SEA) for large-scale classification
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Machine Learning
Density-Based Multiscale Data Condensation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Mining concept-drifting data streams using ensemble classifiers
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Enhancing Density-Based Data Reduction Using Entropy
Neural Computation
Hi-index | 0.00 |
This paper addresses the problem of concept sampling. In many real-world applications, a large collection of mixed concepts is available for decision making. However, the collection is often so large that it is difficult if not unrealistic to utilize those concepts directly, due to the domain-specific limitations of available space or time. This naturally yields the need for concept reduction. In this paper, we introduce the novel problem of concept sampling: to find the optimal subset of a large collection of mixed concepts in advance so that the performance of future decision making can be best preserved by selectively combining the concepts remained in the subset. The problem is formulized as an optimization process based on our derivation of a target function, which ties a clear connection between the composition of the concept subset and the expected error of future decision making upon the subset. Then, based on this target function, a sampling algorithm is developed and its effectiveness is discussed. Extensive empirical studies suggest that, the proposed concept sampling method well preserves the performance of decision making while dramatically reduces the number of concepts maintained and thus justify its usefulness in handling large-scale mixed concepts.