Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Information Retrieval
Essential Dimensions of Latent Semantic Indexing (LSI)
HICSS '07 Proceedings of the 40th Annual Hawaii International Conference on System Sciences
Cyberbullying and Cyberthreats: Responding to the Challenge of Online Social Aggression, Threats, and Distress
A framework for understanding Latent Semantic Indexing (LSI) performance
Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
Prospectus for the next LAPACK and ScaLAPACK libraries
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Modern Information Retrieval
Learning to Identify Internet Sexual Predation
International Journal of Electronic Commerce
Using Machine Learning to Detect Cyberbullying
ICMLA '11 Proceedings of the 2011 10th International Conference on Machine Learning and Applications and Workshops - Volume 02
Learning from bullying traces in social media
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Hi-index | 0.00 |
In this paper we describe a close analysis of the language used in cyberbullying. We take as our corpus a collection of posts from Formspring.me. Formspring.me is a social networking site where users can ask questions of other users. It appeals primarily to teens and young adults and the cyberbullying content on the site is dense; between 7% and 14% of the posts we have analyzed contain cyberbullying content. The results presented in this article are two-fold. Our first experiments were designed to develop an understanding of both the specific words that are used by cyberbullies, and the context surrounding these words. We have identified the most commonly used cyberbullying terms, and have developed queries that can be used to detect cyberbullying content. Five of our queries achieve an average precision of 91.25% at rank 100. In our second set of experiments we extended this work by using a supervised machine learning approach for detecting cyberbullying. The machine learning experiments identify additional terms that are consistent with cyberbullying content, and identified an additional querying technique that was able to accurately assign scores to posts from Formspring.me. The posts with the highest scores are shown to have a high density of cyberbullying content.