Smoothing techniques for adaptive online language models: topic tracking in tweet streams

Authors:
Jimmy Lin;Rion Snow;William Morgan
Affiliations:
Twitter, San Francisco, MD, USA;Twitter, San Francisco, USA;Twitter, San Francisco, USA
Venue:
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2011

Citing 13
Cited 12

Topic Detection and Tracking: Event-Based Information Organization

Topic Detection and Tracking: Event-Based Information Organization
An empirical study of smoothing techniques for language modeling

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Data streams: algorithms and applications

Foundations and Trends® in Theoretical Computer Science
Speech and Language Processing (2nd Edition)

Speech and Language Processing (2nd Edition)
Streaming for large scale NLP: language modeling

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Stream-based randomised language models for SMT

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
TwitterRank: finding topic-sensitive influential twitterers

Proceedings of the third ACM international conference on Web search and data mining
What is Twitter, a social network or a news media?

Proceedings of the 19th international conference on World wide web
Unsupervised modeling of Twitter conversations

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Streaming first story detection with application to Twitter

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Everyone's an influencer: quantifying influence on twitter

Proceedings of the fourth ACM international conference on Web search and data mining
Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on twitter

Proceedings of the 20th international conference on World wide web
Recognizing named entities in tweets

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1

Text stream processing

Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
A framework for summarizing and analyzing twitter feeds

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Open domain event extraction from twitter

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Spatial influence vs. community influence: modeling the global spread of social media

Proceedings of the 21st ACM international conference on Information and knowledge management
Time-aware topic recommendation based on micro-blogs

Proceedings of the 21st ACM international conference on Information and knowledge management
Business Intelligence and Analytics: Research Directions

ACM Transactions on Management Information Systems (TMIS)
NE-Rank: A Novel Graph-Based Keyphrase Extraction in Twitter

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Spatio-temporal dynamics of online memes: a study of geo-tagged tweets

Proceedings of the 22nd international conference on World Wide Web
Spatio-temporal meme prediction: learning what hashtags will be popular where

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
On sampling the wisdom of crowds: random vs. expert sampling of the twitter stream

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Exploiting topic tracking in real-time tweet streams

Proceedings of the 2013 international workshop on Mining unstructured big data using natural language processing
Non-negative multiple matrix factorization

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We are interested in the problem of tracking broad topics such as "baseball" and "fashion" in continuous streams of short texts, exemplified by tweets from the microblogging service Twitter. The task is conceived as a language modeling problem where per-topic models are trained using hashtags in the tweet stream, which serve as proxies for topic labels. Simple perplexity-based classifiers are then applied to filter the tweet stream for topics of interest. Within this framework, we evaluate, both intrinsically and extrinsically, smoothing techniques for integrating "foreground" models (to capture recency) and "background" models (to combat sparsity), as well as different techniques for retaining history. Experiments show that unigram language models smoothed using a normalized extension of stupid backoff and a simple queue for history retention performs well on the task.