An empirical study on selective sampling in active learning for splog detection

  • Authors:
  • Taichi Katayama;Takehito Utsuro;Yuuki Sato;Takayuki Yoshinaka;Yasuhide Kawada;Tomohiro Fukuhara

  • Affiliations:
  • University of Tsukuba, Tsukuba, Japan;University of Tsukuba, Tsukuba, Japan;University of Tsukuba, Tsukuba, Japan;Tokyo Denki University, Tokyo, Japan;Navix Co., Ltd., Tokyo, Japan;University of Tokyo, Kashiwa, Japan

  • Venue:
  • Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper studies how to reduce the amount of human supervision for identifying splogs / authentic blogs in the context of continuously updating splog data sets year by year. Following the previous works on active learning, against the task of splog / authentic blog detection, this paper empirically examines several strategies for selective sampling in active learning by Support Vector Machines (SVMs). As a confidence measure of SVMs learning, we employ the distance from the separating hyperplane to each test instance, which have been well studied in active learning for text classification. Unlike those results of applying active learning to text classification tasks, in the task of splog / authentic blog detection of this paper, it is not the case that adding least confident samples peforms best.