C4.5: programs for machine learning
C4.5: programs for machine learning
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Get another label? improving data quality and data mining using multiple, noisy labelers
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficiently learning the accuracy of labeling sources for selective sampling
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
IEEE Transactions on Knowledge and Data Engineering
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Quality management on Amazon Mechanical Turk
Proceedings of the ACM SIGKDD Workshop on Human Computation
The Journal of Machine Learning Research
Simple Multiple Noisy Label Utilization Strategies
ICDM '11 Proceedings of the 2011 IEEE 11th International Conference on Data Mining
Hi-index | 0.00 |
Internet-based crowdsourcing systems can be viewed as a kind of loosely coupled social networks. With these systems, it is easy to collect multiple noisy labels for the same object when conducting annotation for supervised learning. Because non-expert labelers lack expertise and dedication, and have strong personal preference, they may have bias when labeling. These cause Imbalanced Multiple Noisy Labeling. In this paper, we propose an agnostic algorithm Positive LAbel frequency Threshold (PLAT) to deal with imbalanced labeling. Because of the dynamics of social networks, in most cases no information about the qualities of labelers and underlying class distributions can be acquired. PLAT does not require prior knowledge of the labeling qualities of labelers, the underlying class distributions, and the level of labeling imbalance. Simulations on eight real-world datasets with different underlying class distributions demonstrate that PLAT not only effectively deals with the imbalanced multiple noisy labeling that off-the-shelf agnostic methods cannot cope with, but also performs nearly the same as majority voting under the circumstances that labelers have no bias.