Simple Multiple Noisy Label Utilization Strategies

  • Authors:
  • Victor S. Sheng

  • Affiliations:
  • -

  • Venue:
  • ICDM '11 Proceedings of the 2011 IEEE 11th International Conference on Data Mining
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the outsourcing of small tasks becoming easier, it is possible to obtain non-expert/imperfect labels at low cost. With low-cost imperfect labeling, it is straightforward to collect multiple labels for the same data items. This paper addresses the strategies of utilizing these multiple labels for improving the performance of supervised learning, based on two basic ideas: majority voting and pair wise solutions. We show several interesting results based on our experiments. The soft majority voting strategies can reduce the bias and roughness, and improve the performance of the directed hard majority voting strategy. Pair wise strategies can completely avoid the bias by having both sides (potential correct and incorrect/noisy information) considered (for binary classification). They have very good performance whenever there are a few or many labels available. However, it could also keep the noise. The improved variation that reduces the impact of the noisy information is recommended. All five strategies investigated are labeling quality agnostic strategies, and can be applied to real world applications directly. The experimental results show some of them perform better than or at least very close to the gnostic strategies.