Topic Detection and Tracking: Event-Based Information Organization
Topic Detection and Tracking: Event-Based Information Organization
An empirical study of smoothing techniques for language modeling
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Data streams: algorithms and applications
Foundations and Trends® in Theoretical Computer Science
Speech and Language Processing (2nd Edition)
Speech and Language Processing (2nd Edition)
Streaming for large scale NLP: language modeling
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Stream-based randomised language models for SMT
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
TwitterRank: finding topic-sensitive influential twitterers
Proceedings of the third ACM international conference on Web search and data mining
What is Twitter, a social network or a news media?
Proceedings of the 19th international conference on World wide web
Unsupervised modeling of Twitter conversations
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Streaming first story detection with application to Twitter
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Everyone's an influencer: quantifying influence on twitter
Proceedings of the fourth ACM international conference on Web search and data mining
Proceedings of the 20th international conference on World wide web
Recognizing named entities in tweets
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
A framework for summarizing and analyzing twitter feeds
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Open domain event extraction from twitter
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Spatial influence vs. community influence: modeling the global spread of social media
Proceedings of the 21st ACM international conference on Information and knowledge management
Time-aware topic recommendation based on micro-blogs
Proceedings of the 21st ACM international conference on Information and knowledge management
Business Intelligence and Analytics: Research Directions
ACM Transactions on Management Information Systems (TMIS)
NE-Rank: A Novel Graph-Based Keyphrase Extraction in Twitter
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Spatio-temporal dynamics of online memes: a study of geo-tagged tweets
Proceedings of the 22nd international conference on World Wide Web
Spatio-temporal meme prediction: learning what hashtags will be popular where
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
On sampling the wisdom of crowds: random vs. expert sampling of the twitter stream
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Exploiting topic tracking in real-time tweet streams
Proceedings of the 2013 international workshop on Mining unstructured big data using natural language processing
Non-negative multiple matrix factorization
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
We are interested in the problem of tracking broad topics such as "baseball" and "fashion" in continuous streams of short texts, exemplified by tweets from the microblogging service Twitter. The task is conceived as a language modeling problem where per-topic models are trained using hashtags in the tweet stream, which serve as proxies for topic labels. Simple perplexity-based classifiers are then applied to filter the tweet stream for topics of interest. Within this framework, we evaluate, both intrinsically and extrinsically, smoothing techniques for integrating "foreground" models (to capture recency) and "background" models (to combat sparsity), as well as different techniques for retaining history. Experiments show that unigram language models smoothed using a normalized extension of stupid backoff and a simple queue for history retention performs well on the task.