Exploring high-level features for detecting cyberpedophilia

  • Authors:
  • Dasha Bogdanova;Paolo Rosso;Thamar Solorio

  • Affiliations:
  • -;-;-

  • Venue:
  • Computer Speech and Language
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we suggest a list of high-level features and study their applicability in detection of cyberpedophiles. We used a corpus of chats downloaded from http://www.perverted-justice.com and two negative datasets of different nature: cybersex logs available online, and the NPS chat corpus. The classification results show that the NPS data and the pedophiles' conversations can be accurately discriminated from each other with character n-grams, while in the more complicated case of cybersex logs there is need for high-level features to reach good accuracy levels. In this latter setting our results show that features that model behaviour and emotion significantly outperform the low-level ones, and achieve a 97% accuracy.