Concept sampling: towards systematic selection in large-scale mixed concepts in machine learning

Authors:
Yi Zhang;Xiaoming Jin
Affiliations:
School of Software, Tsinghua University, Beijing, China and Department of Computer Science, Tsinghua University, Beijing, China;School of Software, Tsinghua University, Beijing, China
Venue:
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Year:
2007

Citing 11
Cited 0

Random sampling with a reservoir

ACM Transactions on Mathematical Software (TOMS)
C4.5: programs for machine learning

C4.5: programs for machine learning
Learning in the presence of concept drift and hidden contexts

Machine Learning
Extracting Hidden Context

Machine Learning - Special issue on context sensitivity and concept drift
Reduction Techniques for Instance-BasedLearning Algorithms

Machine Learning
A streaming ensemble algorithm (SEA) for large-scale classification

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Machine Learning

Machine Learning
Density-Based Multiscale Data Condensation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy

Machine Learning
Mining concept-drifting data streams using ensemble classifiers

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Enhancing Density-Based Data Reduction Using Entropy

Neural Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper addresses the problem of concept sampling. In many real-world applications, a large collection of mixed concepts is available for decision making. However, the collection is often so large that it is difficult if not unrealistic to utilize those concepts directly, due to the domain-specific limitations of available space or time. This naturally yields the need for concept reduction. In this paper, we introduce the novel problem of concept sampling: to find the optimal subset of a large collection of mixed concepts in advance so that the performance of future decision making can be best preserved by selectively combining the concepts remained in the subset. The problem is formulized as an optimization process based on our derivation of a target function, which ties a clear connection between the composition of the concept subset and the expected error of future decision making upon the subset. Then, based on this target function, a sampling algorithm is developed and its effectiveness is discussed. Extensive empirical studies suggest that, the proposed concept sampling method well preserves the performance of decision making while dramatically reduces the number of concepts maintained and thus justify its usefulness in handling large-scale mixed concepts.