Active learning with sampling by uncertainty and density for word sense disambiguation and text classification

  • Authors:
  • Jingbo Zhu;Huizhen Wang;Tianshun Yao;Benjamin K. Tsou

  • Affiliations:
  • Northeastern University, Shenyang, Liaoning, P.R.China;Northeastern University, Shenyang, Liaoning, P.R.China;Northeastern University, Shenyang, Liaoning, P.R.China;City University of Hong Kong, HK, P.R.China

  • Venue:
  • COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper addresses two issues of active learning. Firstly, to solve a problem of uncertainty sampling that it often fails by selecting outliers, this paper presents a new selective sampling technique, sampling by uncertainty and density (SUD), in which a k-Nearest-Neighbor-based density measure is adopted to determine whether an unlabeled example is an outlier. Secondly, a technique of sampling by clustering (SBC) is applied to build a representative initial training data set for active learning. Finally, we implement a new algorithm of active learning with SUD and SBC techniques. The experimental results from three real-world data sets show that our method outperforms competing methods, particularly at the early stages of active learning.