Semi-random subspace method for writeprint identification

  • Authors:
  • Zhi Liu;Zongkai Yang;Sanya Liu;Yinghui Shi

  • Affiliations:
  • National Engineering Research Center for E-Learning, 152 Luoyu Road, Wuhan, Hubei 430079, PR China;National Engineering Research Center for E-Learning, 152 Luoyu Road, Wuhan, Hubei 430079, PR China;National Engineering Research Center for E-Learning, 152 Luoyu Road, Wuhan, Hubei 430079, PR China;National Engineering Research Center for E-Learning, 152 Luoyu Road, Wuhan, Hubei 430079, PR China

  • Venue:
  • Neurocomputing
  • Year:
  • 2013

Quantified Score

Hi-index 0.01

Visualization

Abstract

The anonymous nature of online messages distribution causes a series of moral and legal issues. By analyzing identity cues people leave behind their texts, i.e., writeprint, potential authors can be identified individually. But writeprint identification is a difficult learning task, because of the high redundancy in stylistic feature set and high similarity of some authors' writing-style. In this paper, we propose a novel method, called semi-random subspace (Semi-RS), to simultaneously address the two problems. Different from the conventional random subspace method (RSM) which samples features from the whole feature set in a completely random way, the proposed Semi-RS randomly samples features on each individual-author feature set (IAFS) partitioned from the whole feature set. More specifically, we first divide the whole feature set into several IAFSs in a deterministic way, then construct a set of base classifiers on different randomly sampled feature sets from each IAFS, and finally combine all base classifiers for the final decision. Experimental results on the benchmark dataset demonstrate the effectiveness of the proposed method which improves previously reported results. In addition, we analyze the diversity of algorithm, reveals that Semi-RS constructs more diverse base classifiers than conventional RSMs.