Topical anomaly detection from Twitter stream

Authors:
Pramod Anantharam;Krishnaprasad Thirunarayan;Amit Sheth
Affiliations:
Wright State University, Dayton, OH.;Wright State University, Dayton, OH.;Wright State University, Dayton, OH.
Venue:
Proceedings of the 3rd Annual ACM Web Science Conference
Year:
2012

Citing 4
Cited 1

Text classification and named entities for new event detection

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Predicting trusts among users of online communities: an epinions case study

Proceedings of the 9th ACM conference on Electronic commerce
Reading between the lines: linguistic cues to deception in online dating profiles

Proceedings of the 2010 ACM conference on Computer supported cooperative work
Extracting trust from domain analysis: a case study on the wikipedia project

ATC'06 Proceedings of the Third international conference on Autonomic and Trusted Computing

Comparative trust management with applications: Bayesian approaches emphasis

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we spot topically anomalous tweets in twitter streams by analyzing the content of the document pointed to by the URLs in the tweets in preference to their textual content. Existing approaches to anomaly detection ignore such URLs thereby missing opportunities to detect off-topic tweets. Specifically, we determine the divergence of claimed topic of a tweet as reflected by the hashtags and the actual topic as reflected by the referenced document content. Our approach avoids the need for labeled samples by selecting documents from reliable sources gleaned from the URLs present in the tweets. These documents are used for comparison against documents associated with unknown URLs in incoming tweets improving reliability, scalability and adaptability to rapidly changing topics. We evaluate our approach on three events and show that it can find topical inconsistencies not detectable by existing approaches.