Detecting cyberbullying: query terms and techniques

  • Authors:
  • April Kontostathis;Kelly Reynolds;Andy Garron;Lynne Edwards

  • Affiliations:
  • Ursinus College, Collegeville PA;Lehigh University, Bethlehem PA;University of Maryland, College Park, MD;Ursinus College, Collegeville PA

  • Venue:
  • Proceedings of the 5th Annual ACM Web Science Conference
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we describe a close analysis of the language used in cyberbullying. We take as our corpus a collection of posts from Formspring.me. Formspring.me is a social networking site where users can ask questions of other users. It appeals primarily to teens and young adults and the cyberbullying content on the site is dense; between 7% and 14% of the posts we have analyzed contain cyberbullying content. The results presented in this article are two-fold. Our first experiments were designed to develop an understanding of both the specific words that are used by cyberbullies, and the context surrounding these words. We have identified the most commonly used cyberbullying terms, and have developed queries that can be used to detect cyberbullying content. Five of our queries achieve an average precision of 91.25% at rank 100. In our second set of experiments we extended this work by using a supervised machine learning approach for detecting cyberbullying. The machine learning experiments identify additional terms that are consistent with cyberbullying content, and identified an additional querying technique that was able to accurately assign scores to posts from Formspring.me. The posts with the highest scores are shown to have a high density of cyberbullying content.