Social (distributed) language modeling, clustering and dialectometry

  • Authors:
  • David Ellis

  • Affiliations:
  • Facebook, Palo Alto, CA

  • Venue:
  • TextGraphs-4 Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present ongoing work in a scalable, distributed implementation of over 200 million individual language models, each capturing a single user's dialect in a given language (multilingual users have several models). These have a variety of practical applications, ranging from spam detection to speech recognition, and dialectometrical methods on the social graph. Users should be able to view any content in their language (even if it is spoken by a small population), and to browse our site with appropriately translated interface (automatically generated, for locales with little crowd-sourced community effort).