A threshold method for imbalanced multiple noisy labeling

Authors:
Jing Zhang;Xindong Wu;Victor S. Sheng
Affiliations:
Hefei University of Technology, Hefei, China;University of Vermont, Burlington, VT and Hefei University of Technology, Hefei, China;University of Central Arkansas, Conway, AR
Venue:
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Year:
2013

Citing 9
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Get another label? improving data quality and data mining using multiple, noisy labelers

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficiently learning the accuracy of labeling sources for selective sampling

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning from Imbalanced Data

IEEE Transactions on Knowledge and Data Engineering
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Quality management on Amazon Mechanical Turk

Proceedings of the ACM SIGKDD Workshop on Human Computation
Learning From Crowds

The Journal of Machine Learning Research
Simple Multiple Noisy Label Utilization Strategies

ICDM '11 Proceedings of the 2011 IEEE 11th International Conference on Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Internet-based crowdsourcing systems can be viewed as a kind of loosely coupled social networks. With these systems, it is easy to collect multiple noisy labels for the same object when conducting annotation for supervised learning. Because non-expert labelers lack expertise and dedication, and have strong personal preference, they may have bias when labeling. These cause Imbalanced Multiple Noisy Labeling. In this paper, we propose an agnostic algorithm Positive LAbel frequency Threshold (PLAT) to deal with imbalanced labeling. Because of the dynamics of social networks, in most cases no information about the qualities of labelers and underlying class distributions can be acquired. PLAT does not require prior knowledge of the labeling qualities of labelers, the underlying class distributions, and the level of labeling imbalance. Simulations on eight real-world datasets with different underlying class distributions demonstrate that PLAT not only effectively deals with the imbalanced multiple noisy labeling that off-the-shelf agnostic methods cannot cope with, but also performs nearly the same as majority voting under the circumstances that labelers have no bias.