Extracting significant time varying features from text
Proceedings of the eighth international conference on Information and knowledge management
Automatic generation of overview timelines
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Bursty and Hierarchical Structure in Streams
Data Mining and Knowledge Discovery
The Journal of Machine Learning Research
Meme-tracking and the dynamics of the news cycle
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Chatter on the red: what hazards threat reveals about the social life of microblogged information
Proceedings of the 2010 ACM conference on Computer supported cooperative work
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
What is Twitter, a social network or a news media?
Proceedings of the 19th international conference on World wide web
Earthquake shakes Twitter users: real-time event detection by social sensors
Proceedings of the 19th international conference on World wide web
TwitterMonitor: trend detection over the twitter stream
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
PET: a statistical model for popular events tracking in social communities
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Summarizing microblogs automatically
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Flu detector: tracking epidemics on twitter
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Patterns of temporal variation in online media
Proceedings of the fourth ACM international conference on Web search and data mining
Hip and trendy: Characterizing emerging trends on Twitter
Journal of the American Society for Information Science and Technology
Hi-index | 0.00 |
On-line social networks have become a massive communication and information channel for users world-wide. In particular, the microblogging platform Twitter, is characterized by short-text message exchanges at extremely high rates. In this type of scenario, the detection of emerging topics in text streams becomes an important research area, essential for identifying relevant new conversation topics, such as breaking news and trends. Although emerging topic detection in text is a well established research area, its application to large volumes of streaming text data is quite novel. Making scalability, efficiency and rapidness, the key aspects for any emerging topic detection algorithm in this type of environment. Our research addresses the aforementioned problem by focusing on detecting significant and unusual bursts in keyword arrival rates or bursty keywords. We propose a scalable and fast on-line method that uses normalized individual frequency signals per term and a windowing variation technique. This method reports keyword bursts which can be composed of single or multiple terms, ranked according to their importance. The average complexity of our method is O(n log n), where n is the number of messages in the time window. This complexity allows our approach to be scalable for large streaming datasets. If bursts are only detected and not ranked, the algorithm remains with lineal complexity O(n), making it the fastest in comparison to the current state-of-the-art. We validate our approach by comparing our performance to similar systems using the TREC Tweet 2011 Challenge tweets, obtaining 91% of matches with LDA, an off-line gold standard used in similar evaluations. In addition, we study Twitter messages related to the SuperBowl football events in 2011 and 2013.