Robust kaomoji detection in Twitter

  • Authors:
  • Steven Bedrick;Russell Beckley;Brian Roark;Richard Sproat

  • Affiliations:
  • Oregon Health & Science University, Portland, Oregon;Oregon Health & Science University, Portland, Oregon;Oregon Health & Science University, Portland, Oregon;Oregon Health & Science University, Portland, Oregon

  • Venue:
  • LSM '12 Proceedings of the Second Workshop on Language in Social Media
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we look at the problem of robust detection of a very productive class of Asian style emoticons, known as facemarks or kaomoji. We demonstrate the frequency and productivity of these sequences in social media such as Twitter. Previous approaches to detection and analysis of kaomoji have placed limits on the range of phenomena that could be detected with their method, and have looked at largely monolingual evaluation sets (e.g., Japanese blogs). We find that these emoticons occur broadly in many languages, hence our approach is language agnostic. Rather than relying on regular expressions over a predefined set of likely tokens, we build weighted context-free grammars that reward graphical affinity and symmetry within whatever symbols are used to construct the emoticon.