The Journal of Machine Learning Research
Smokey: automatic recognition of hostile messages
AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Enhanced sentiment learning using Twitter hashtags and smileys
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Modeling of stylistic variation in social media with stretchy patterns
DIALECTS '11 Proceedings of the First Workshop on Algorithms and Resources for Modelling of Dialects and Language Varieties
Offensive language detection using multi-level classification
AI'10 Proceedings of the 23rd Canadian conference on Advances in Artificial Intelligence
Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing
Hi-index | 0.00 |
In this paper, we propose a novel semi-supervised approach for detecting profanity-related offensive content in Twitter. Our approach exploits linguistic regularities in profane language via statistical topic modeling on a huge Twitter corpus, and detects offensive tweets using automatically these generated features. Our approach performs competitively with a variety of machine learning (ML) algorithms. For instance, our approach achieves a true positive rate (TP) of 75.1% over 4029 testing tweets using Logistic Regression, significantly outperforming the popular keyword matching baseline, which has a TP of 69.7%, while keeping the false positive rate (FP) at the same level as the baseline at about 3.77%. Our approach provides an alternative to large scale hand annotation efforts required by fully supervised learning approaches.