A Density-Based Re-ranking Technique for Active Learning for Data Annotations

  • Authors:
  • Jingbo Zhu;Huizhen Wang;Benjamin K. Tsou

  • Affiliations:
  • Natural Language Processing Laboratory, Northeastern University, Shenyang, P.R. China;Natural Language Processing Laboratory, Northeastern University, Shenyang, P.R. China;Language Information Sciences Research Centre, City University of Hong Kong, Hong Kong

  • Venue:
  • ICCPOL '09 Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

One of the popular techniques of active learning for data annotations is uncertainty sampling, however, which often presents problems when outliers are selected. To solve this problem, this paper proposes a density-based re-ranking technique, in which a density measure is adopted to determine whether an unlabeled example is an outlier. The motivation of this study is to prefer not only the most informative example in terms of uncertainty measure, but also the most representative example in terms of density measure. Experimental results of active learning for word sense disambiguation and text classification tasks using six real-world evaluation data sets show that our proposed density-based re-ranking technique can improve uncertainty sampling.