Robust kaomoji detection in Twitter

Authors:
Steven Bedrick;Russell Beckley;Brian Roark;Richard Sproat
Affiliations:
Oregon Health & Science University, Portland, Oregon;Oregon Health & Science University, Portland, Oregon;Oregon Health & Science University, Portland, Oregon;Oregon Health & Science University, Portland, Oregon
Venue:
LSM '12 Proceedings of the Second Workshop on Language in Social Media
Year:
2012

Citing 4
Cited 0

Extraction and classification of facemarks

Proceedings of the 10th international conference on Intelligent user interfaces
MAP adaptation of stochastic grammars

Computer Speech and Language
CAO: A Fully Automatic Emoticon Analysis System Based on Theory of Kinesics

IEEE Transactions on Affective Computing
Beam-width prediction for efficient context-free parsing

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we look at the problem of robust detection of a very productive class of Asian style emoticons, known as facemarks or kaomoji. We demonstrate the frequency and productivity of these sequences in social media such as Twitter. Previous approaches to detection and analysis of kaomoji have placed limits on the range of phenomena that could be detected with their method, and have looked at largely monolingual evaluation sets (e.g., Japanese blogs). We find that these emoticons occur broadly in many languages, hence our approach is language agnostic. Rather than relying on regular expressions over a predefined set of likely tokens, we build weighted context-free grammars that reward graphical affinity and symmetry within whatever symbols are used to construct the emoticon.